Ranking from Crowdsourced Pairwise Comparisons via Matrix Manifold Optimization

Similar documents
Sketchy Decisions: Convex Low-Rank Matrix Optimization with Optimal Storage

Sparse and low-rank decomposition for big data systems via smoothed Riemannian optimization

Ranking from Crowdsourced Pairwise Comparisons via Smoothed Matrix Manifold Optimization

Lift me up but not too high Fast algorithms to solve SDP s with block-diagonal constraints

Robust Principal Component Pursuit via Alternating Minimization Scheme on Matrix Manifolds

A Randomized Approach for Crowdsourcing in the Presence of Multiple Views

Lasso: Algorithms and Extensions

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

Manifold optimization for k-means clustering

COR-OPT Seminar Reading List Sp 18

Collaborative Filtering Matrix Completion Alternating Least Squares

Linear Regression. Aarti Singh. Machine Learning / Sept 27, 2010

Non-convex optimization. Issam Laradji

Fantope Regularization in Metric Learning

CPSC 540: Machine Learning

Riemannian Optimization and its Application to Phase Retrieval Problem

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

Decoupled Collaborative Ranking

Big Data Analytics: Optimization and Randomization

A Brief Overview of Practical Optimization Algorithms in the Context of Relaxation

Uncorrelated Multilinear Principal Component Analysis through Successive Variance Maximization

Rapid, Robust, and Reliable Blind Deconvolution via Nonconvex Optimization

The Power of Sparse and Low-Rank Optimization Paradigms for Network Densification

The non-convex Burer Monteiro approach works on smooth semidefinite programs

An Extended Frank-Wolfe Method, with Application to Low-Rank Matrix Completion

Optimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison

8 Numerical methods for unconstrained problems

Accelerated Proximal Gradient Methods for Convex Optimization

Generalized Conditional Gradient and Its Applications

Higher-Order Methods

Convex Optimization Algorithms for Machine Learning in 10 Slides

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725

Research Statement Qing Qu

II. DIFFERENTIABLE MANIFOLDS. Washington Mio CENTER FOR APPLIED VISION AND IMAGING SCIENCES

A Fast Augmented Lagrangian Algorithm for Learning Low-Rank Matrices

Accelerated Block-Coordinate Relaxation for Regularized Optimization

Nonlinear Optimization Methods for Machine Learning

Block Coordinate Descent for Regularized Multi-convex Optimization

Mirror Descent for Metric Learning. Gautam Kunapuli Jude W. Shavlik

w Kluwer Academic Publishers Boston/Dordrecht/London HANDBOOK OF SEMIDEFINITE PROGRAMMING Theory, Algorithms, and Applications

Improving L-BFGS Initialization for Trust-Region Methods in Deep Learning

Proximal methods. S. Villa. October 7, 2014

Overparametrization for Landscape Design in Non-convex Optimization

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

Composite nonlinear models at scale

On the convergence of higher-order orthogonality iteration and its extension

Nonlinear Optimization for Optimal Control

Methods that avoid calculating the Hessian. Nonlinear Optimization; Steepest Descent, Quasi-Newton. Steepest Descent

The Projected Power Method: An Efficient Algorithm for Joint Alignment from Pairwise Differences

Tutorial: PART 2. Optimization for Machine Learning. Elad Hazan Princeton University. + help from Sanjeev Arora & Yoram Singer

Characterization of Gradient Dominance and Regularity Conditions for Neural Networks

Conditional Gradient (Frank-Wolfe) Method

Lecture 5 : Projections

Projection methods to solve SDP

A tutorial on sparse modeling. Outline:

Proximal Newton Method. Ryan Tibshirani Convex Optimization /36-725

Learning Task Grouping and Overlap in Multi-Task Learning

arxiv: v1 [math.oc] 1 Jul 2016

ECS289: Scalable Machine Learning

arxiv: v2 [math.oc] 6 Jan 2016

Motivation Subgradient Method Stochastic Subgradient Method. Convex Optimization. Lecture 15 - Gradient Descent in Machine Learning

Riemannian Optimization Method on the Flag Manifold for Independent Subspace Analysis

Dropping Convexity for Faster Semi-definite Optimization

Coordinate Descent and Ascent Methods

Optimisation non convexe avec garanties de complexité via Newton+gradient conjugué

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization

STUDY of genetic variations is a critical task to disease

Optimization for Machine Learning

The convex algebraic geometry of rank minimization

Inverse Problems and Optimal Design in Electricity and Magnetism

Support Vector Machines: Training with Stochastic Gradient Descent. Machine Learning Fall 2017

arxiv: v3 [math.oc] 19 Oct 2017

Stochastic Analogues to Deterministic Optimizers

Sparse Covariance Selection using Semidefinite Programming

A Unified Approach to Proximal Algorithms using Bregman Distance

Spectral Processing. Misha Kazhdan

Parallel Coordinate Optimization

Numerical Optimization Professor Horst Cerjak, Horst Bischof, Thomas Pock Mat Vis-Gra SS09

ECS289: Scalable Machine Learning

Large-scale Collaborative Ranking in Near-Linear Time

Complexity analysis of second-order algorithms based on line search for smooth nonconvex optimization

Optimization II: Unconstrained Multivariable

distances between objects of different dimensions

A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization

Matrix Completion: Fundamental Limits and Efficient Algorithms

A Riemannian low-rank method for optimization over semidefinite matrices with block-diagonal constraints

Tutorial on: Optimization I. (from a deep learning perspective) Jimmy Ba

Generalized Power Method for Sparse Principal Component Analysis

Conditional Gradient Algorithms for Rank-One Matrix Approximations with a Sparsity Constraint

STA141C: Big Data & High Performance Statistical Computing

Manifold Optimization Over the Set of Doubly Stochastic Matrices: A Second-Order Geometry

Randomized Coordinate Descent Methods on Optimization Problems with Linearly Coupled Constraints

Sparse Optimization Lecture: Basic Sparse Optimization Models

Collaborative Filtering

IPAM Summer School Optimization methods for machine learning. Jorge Nocedal

Stochastic Optimization Algorithms Beyond SG

Newton s Method. Ryan Tibshirani Convex Optimization /36-725

Agenda. Fast proximal gradient methods. 1 Accelerated first-order methods. 2 Auxiliary sequences. 3 Convergence analysis. 4 Numerical examples

Lecture 9: September 28

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Transcription:

Ranking from Crowdsourced Pairwise Comparisons via Matrix Manifold Optimization Jialin Dong ShanghaiTech University 1

Outline Introduction FourVignettes: System Model and Problem Formulation Problem Analysis Convex Methods Disadvantage: Why Not Convex Optimization? Scalable Nonconvex Optimization Motivation: Why Nonconvex Optimization? Matrix Manifold Optimization Regularized Smoothed Maximum-likelihood Estimation Simulation Results Summary 2

Part I : Introduction 3

Ranking Given n items, infer an ordering on the items based on partial sampled data Applications??????????????? 4

Crowdsourcing Crowdsourcing: Harness the power of human computation to solve tasks Applications: Machine learning models, clustering data, scencing recognition 5

Pairwise Measurements Pairwise relations over a few object pairs: difficult to directly measure each individual object Applications localization, Alignment, registration and synchronization, community detection... [Chen & Goldsmith, 14] 6

Part II : Four Vignettes 7

Vignettes A: System Model and Problem Formulation 8

System Model m crowd users, n items to be ranked Underlying weight matrix X: low-rank, unknown Pairwise comparisons Sampling set??????????????? 9

System Model BTL model [Bradley & Terry, 1952]: Associated score Individual ranking list for user i: 10

Problem Formulation Maximum-likelihood estimation (MLE): the negative log-likelihood function is given by Low-rank matrix optimization problem How to solve this low-rank matrix optimization problem? 11

Vignettes B: Problem Analysis 12

Vignettes B.1: Convex Methods 13

Generic low-rank matrix optimization Rank-constrained matrix optimization problem is a real-linear map on matrices is convex and differentiable Challenge 1: Reliably solve the low-rank matrix problem at scale Challenge II: Develop optimization algorithms with optimal storage 14

A brief biased history of convex methods 1990s: Interior-point methods (computationally expensive) Storage cost for Hessian 2000s: Convex first-order methods (Accelerated) proximal gradient, spectral bundle methods, and others Store matrix variable 2008-Present: Storage-efficient convex first-order methods Conditional gradient method(cgm) and extensions Store matrix in low-rank form after iterations: no storage guarantees Interior-point: Nemirovski & Nesterov 1994;... First-order: Rockafellar 1976; Helmberg & Rendl 1997; Auslender & Teboulle 2006;... CGM: Frank & Wolfe 1956; Levitin & Poljak 1967; Jaggi 2013;... 15

Convex Relaxation Approach Nuclear norm relaxation to solve the original problem: Theoretical foundations: Beautiful, nearly complete theory Effective algorithms: Spectral projected-gradient (SPG) [Davenport et al., 14], Newton-ADMM method [Ali et al.,17] Use generic methods for not huge problems: high level language support (CVX/CVXPY/Convex.jl) makes prototyping easy 16

Why not Convex methods have slow memory hogs, high computational complexity Computationally expensive: Storage issue: 17

Vignettes B.2: Scalable Nonconvex Optimization 18

Recent advances in nonconvex optimization 2009 Present: Nonconvex heuristics Burer Monteiro factorization idea + various nonlinear programming methods Store low-rank matrix factors Guaranteed solutions: Global optimality with statistical assumptions Matrix completion/recovery: [Sun-Luo 14], [Chen-Wainwright 15], [Ge-Lee- Ma 16], Phase retrieval: [Candes et al., 15], [Chen-Candes 15], [Sun-Qu-Wright 16] Community detection/phase synchronization [Bandeira-Boumal- Voroninski 16], [Montanari et al., 17], 19

Why Nonconvex Optimization Convex methods: Slow memory hogs: High computational complexity, e.g., singular value decomposition Nonconvex methods: fast, lightweight Under certain statistical models with benign global geometry Store low-rank matrix factors 20

Vignettes C: Matrix Manifold Optimization 21

What is matrix manifold optimization? Matrix manifold (or Riemannian) optimization problem is a smooth function is a Riemannian manifold: spheres, orthonormal bases (Stiefel), rotations, positive definite matrices, fixed-rank matrices, Euclidean distance matrices, semidefinite fixed-rank matrices, linear subspaces (Grassmann), phases, essential matrices, fixed-rank tensors, Euclidean spaces... 22 How to reformulate the original problem to Riemannian optimization problem?

Reformulate to the Riemannian optimization problem The original problem The reformulated matrix manifold optimization problem Problem with respect to Fixed-rank matrices manifold F removing the origin Challenge: nonsmooth elementwise infinity norm constraint 23

Smoothing Motivation: derive smooth objective to implement Riemannian optimization 24

Regularization Motivation: Address the constraint of smoothed surrogate of the element-wise infinity norm Convex regularized function: How to develop the algorithm on the manifold? 25

Manifold optimization paradigms Generalize Euclidean gradient (Hessian) to Riema ia gradie t (Hessia ) Riemannian Gradient Euclidean Gradient Retraction Operator We need Riemannian geometry: 1) linearize search space into a tangent space ; 2) pick a metric, i.e., Riemannian metric, on to give intrinsic notions of gradient and Hessian 26

Optimization on the manifold: main idea 27

Optimization on the manifold: main idea 28

Optimization on the manifold: main idea 29

Optimization on the manifold: main idea 30

Example: Rayleigh quotient Optimization over (sphere) manifold The cost function is smooth on Step 1: Compute the Euclidean gradient in, symmetric matrix Step 2: Compute the Riemannian gradient on via projecting to the tangent space using the orthogonal projector 31

Ranking problem in this paper Low-rank optimization for ranking via Riemannian optimization How to efficiently compute the descent direction on the manifold? 32

Riemannian trust-region method Sub-optimal problem Update the iterate 33

Vignettes D: Simulation Result 34

Algorithms & Metric Algorithms Proposed Riemannian trust-region algorithm solving log-sum-exp regularized problem ( PRTRS) Bi-factor gradient descent solving log-barrier regularized problem (BFGDB) [Park et al., 16] Spectral projected-gradient (SPG) [Davenport et al., 14] Metric Sampling size: Relative mean square error (RMSE):, rescaled sample size : d 35

Convergence rate Objective PRTRS BFGDB [Park et al., 16] SPG [Davenport et al., 14] PRTRS: Faster rate of convergence than BFGDB algorithm Comparable with SPG algorithm 36

Performance Conclusion Rescaled sample size increases, relative MSE reduces PRTRS: better performance in terms of MSE than both SPG and BFGDB 37

Computational Cost Conclusion Computational time with different sizes K respectively PRTRS: dramatical advantage in computational time of both SPG and BFGDB GLOBECOM 2017 TUTORIAL 38

Part III: Summary 39

Concluding remarks Ranking from pairwise comparisons in the crowdsourcing system Scalable nonconvex optimization algorithms Store low-rank matrix factors Global optimality with statistical assumptions Matrix manifold optimization Smoothed regularized MLE Riemannian trust-region algorithm: Outperform the state the state-of-art algorithms performance (i.e., relative MSE) computational cost convergence rate 40

Thanks 41