Supplementary Materials for Riemannian Pursuit for Big Matrix Recovery

Similar documents
Riemannian Pursuit for Big Matrix Recovery

Multipath Matching Pursuit

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil

Exact Low-rank Matrix Recovery via Nonconvex M p -Minimization

STA141C: Big Data & High Performance Statistical Computing

Stability and Robustness of Weak Orthogonal Matching Pursuits

STA141C: Big Data & High Performance Statistical Computing

Unconstrained optimization

New Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

Solving Corrupted Quadratic Equations, Provably

An algebraic perspective on integer sparse recovery

Strengthened Sobolev inequalities for a random subspace of functions

Noisy Streaming PCA. Noting g t = x t x t, rearranging and dividing both sides by 2η we get

Natural Gradient Learning for Over- and Under-Complete Bases in ICA

Conditions for Robust Principal Component Analysis

On the l 1 -Norm Invariant Convex k-sparse Decomposition of Signals

SPARSE signal representations have gained popularity in recent

Subspace Projection Matrix Completion on Grassmann Manifold

DS-GA 1002 Lecture notes 10 November 23, Linear models

Conjugate Gradient Method

Optimization methods

CS 229r: Algorithms for Big Data Fall Lecture 19 Nov 5

Pseudoinverse & Moore-Penrose Conditions

Multi-stage convex relaxation approach for low-rank structured PSD matrix recovery

11 a 12 a 21 a 11 a 22 a 12 a 21. (C.11) A = The determinant of a product of two matrices is given by AB = A B 1 1 = (C.13) and similarly.

Randomness-in-Structured Ensembles for Compressed Sensing of Images

Robust Principal Component Pursuit via Alternating Minimization Scheme on Matrix Manifolds

Accelerated Block-Coordinate Relaxation for Regularized Optimization

Chapter 4. Unconstrained optimization

Elaine T. Hale, Wotao Yin, Yin Zhang

Generalized Orthogonal Matching Pursuit- A Review and Some

Complementary Matching Pursuit Algorithms for Sparse Approximation

Review of Some Concepts from Linear Algebra: Part 2

The Singular Value Decomposition (SVD) and Principal Component Analysis (PCA)

Singular Value Decomposition (SVD)

Homework 1. Yuan Yao. September 18, 2011

Compressed Sensing and Sparse Recovery

Linear Systems. Carlo Tomasi

Elementary linear algebra

15 Singular Value Decomposition

Computational Methods. Eigenvalues and Singular Values

Sensing systems limited by constraints: physical size, time, cost, energy

Recovery of Low-Rank Plus Compressed Sparse Matrices with Application to Unveiling Traffic Anomalies

Rank-constrained Optimization: A Riemannian Manifold Approach

Guaranteed Rank Minimization via Singular Value Projection: Supplementary Material

Introduction to Compressed Sensing

SOR- and Jacobi-type Iterative Methods for Solving l 1 -l 2 Problems by Way of Fenchel Duality 1

Solution-recovery in l 1 -norm for non-square linear systems: deterministic conditions and open questions

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

January 29, Non-linear conjugate gradient method(s): Fletcher Reeves Polak Ribière January 29, 2014 Hestenes Stiefel 1 / 13

ENGG5781 Matrix Analysis and Computations Lecture 8: QR Decomposition

Two Results on the Schatten p-quasi-norm Minimization for Low-Rank Matrix Recovery

INDUSTRIAL MATHEMATICS INSTITUTE. B.S. Kashin and V.N. Temlyakov. IMI Preprint Series. Department of Mathematics University of South Carolina

Equivalence Probability and Sparsity of Two Sparse Solutions in Sparse Representation

Orthogonal Matching Pursuit for Sparse Signal Recovery With Noise

NORMS ON SPACE OF MATRICES

Block Bidiagonal Decomposition and Least Squares Problems

AN AUGMENTED LAGRANGIAN AFFINE SCALING METHOD FOR NONLINEAR PROGRAMMING

Sparse Solutions of Linear Systems of Equations and Sparse Modeling of Signals and Images!

THE PERTURBATION BOUND FOR THE SPECTRAL RADIUS OF A NON-NEGATIVE TENSOR

Conceptual Questions for Review

Low-Rank Matrix Recovery

Sparsity in Underdetermined Systems

Lecture 13: Orthogonal projections and least squares (Section ) Thang Huynh, UC San Diego 2/9/2018

Linear Systems. Carlo Tomasi. June 12, r = rank(a) b range(a) n r solutions

Compressed Sensing and Robust Recovery of Low Rank Matrices

IV. Matrix Approximation using Least-Squares

Learning Theory of Randomized Kaczmarz Algorithm

Lecture 2: Linear Algebra Review

2. Linear algebra. matrices and vectors. linear equations. range and nullspace of matrices. function of vectors, gradient and Hessian

The matrix will only be consistent if the last entry of row three is 0, meaning 2b 3 + b 2 b 1 = 0.

arxiv: v1 [math.na] 1 Sep 2018

Recent Developments in Compressed Sensing

Chapter 6 - Orthogonality

Bindel, Fall 2009 Matrix Computations (CS 6210) Week 8: Friday, Oct 17

Linear Algebra Massoud Malek

Rank-Constrainted Optimization: A Riemannian Manifold Approach

Stochastic Optimization Algorithms Beyond SG

Signal Recovery from Permuted Observations

The University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013.

Sparse Solutions of Systems of Equations and Sparse Modelling of Signals and Images

A NEW ITERATIVE METHOD FOR THE SPLIT COMMON FIXED POINT PROBLEM IN HILBERT SPACES. Fenghui Wang

EE731 Lecture Notes: Matrix Computations for Signal Processing

Sparse linear models

of Orthogonal Matching Pursuit

Sparsest Solutions of Underdetermined Linear Systems via l q -minimization for 0 < q 1

On the convergence properties of the modified Polak Ribiére Polyak method with the standard Armijo line search

Review of similarity transformation and Singular Value Decomposition

An extrinsic look at the Riemannian Hessian

The Sparsest Solution of Underdetermined Linear System by l q minimization for 0 < q 1

Chap. 3. Controlled Systems, Controllability

Tractable Upper Bounds on the Restricted Isometry Constant

ECE 275A Homework #3 Solutions

Linear Algebra. Paul Yiu. Department of Mathematics Florida Atlantic University. Fall A: Inner products

IEEE SIGNAL PROCESSING LETTERS, VOL. 22, NO. 9, SEPTEMBER

arxiv: v1 [math.oc] 22 May 2018

arxiv: v5 [math.na] 16 Nov 2017

Distributed Inexact Newton-type Pursuit for Non-convex Sparse Learning

Transcription:

Supplementary Materials for Riemannian Pursuit for Big Matrix Recovery Mingkui Tan, School of omputer Science, The University of Adelaide, Australia Ivor W. Tsang, IS, University of Technology Sydney, Australia Li Wang, Department of Mathematics, University of alifornia San Diego, USA Bart Vandereycken, Department of Mathematics, Princeton University, USA Sinno Jialin Pan, Institute for Infocomm Research, Singapore TANMINGKUI@GMAIL.OM IVOR.TSANG@GMAIL.OM LIW0@USD.EDU BARTV@PRINETON.EDU JSPAN@IR.A-STAR.EDU.SG Abstract In this supplementary file, we first present the parameter setting for ρ, and then present the proof of the lemmas and theorems appeared in the main paper.. Parameter Setting for ρ In the main paper, we present 4 as a simple and effective method to choose ρ, which is motivated by the thresholding strategy in StOMP for sparse signal recovery Donoho et al., 0. Specifically, let σ be the singular vector of A b, where σ i is arranged in descending order, we choose ρ such that where η 0.60 is usually a good choice. σ i ησ, i ρ, However, it is not trivial to predict the number of singular values that satisfy for big matrices if we do not want to compute a full SVD. Since ρ in general is small, we propose to compute σ i sequentially until condition is violated. Let B be a small integer. We propose to compute B singular values per iteration. Basically, if σ i > ησ where i, we can compute the singular values σ i+,..., σ i+b by performing a rank B truncated SVD on A i = A b i j= σ ju j vj T using PROPAK. In practice, we suggest setting B. The schematic of the Sequential Truncated SVD for Setting ρ is presented in Algorithm. Notice that, PROPAK involves only matrix-vector product with A i and A T i which can be calculated as UdiagσV T r by UdiagσV T r for the low-rank term in A i. We remark that instead of Algorithm, a more efficient technique may involve restarting the Krylov-based method, like PROPAK, with an increasingly larger subspace until is satisfied. Algorithm Sequential Truncated SVD for Setting ρ. : Given η and A b, initialize ρ = and B >. : Do the rank- truncated SVD on A b, obtaining σ R, U R m and V R n. 3: If σ ρ ησ, stop and return ρ =. 4: while σ ρ < ησ do 5: Let ρ = ρ + B. 6: Do the rank-b truncated SVD on A b UdiagσV T, obtaining σ B R B, U B R m B and V B R n B. 7: Let U = [U U B ], V = [V V B ] and σ = [σ σ B ]. 8: end while 9: Return ρ. Proceedings of the 3 st International onference on Machine Learning, Beijing, hina, 04. JMLR: W&P volume 3. opyright 04 by the authors.

. Main Theoretical Results in the Paper Riemannian Pursuit for Big Matrix Recovery We first repeat the main results in the paper before we prove Lemma and Theorem the other results were already proven in the paper. Proposition. In M, suppose the observed entry set Ξ is sampled according to the Bernoulli model with each entry i, j Ξ being independently drawn from a probability p. There exists a constant > 0, for all γ r 0,, µ B, n m 3, if p µ B r logn/γ r m, the following RIP condition holds γ r p X F P Ξ X F + γ r p X F, for any µ B -incoherent matrix X R m n of rank at most r with probability at least exp n log n. Lemma. Let {X t } be the sequence generated by RP, then where τ t satisfies condition in 0 of the paper. fx t fx t τt Ht. 3 Theorem. Let {X t } be the sequence generated by RP and ζ = min{τ,, τ ι }. As long as fx t e where > and if there exists an integer ι > 0 such that γ r+ιρ <, then RP decreases linearly in objective values when t < ι, namely fx t+ νfx t, where ν = ρζ γ r+ιρ r. + γ r+ιρ Proposition Sato & Iwai 03. Given the retraction 8 and vector transport 0 on M r in the paper, there exists a step size θ k that satisfies the strong Wolfe conditions 7 and 8 of the paper. Lemma. If c <, then the search direction ζ k generated by NRG with Fletcher-Reeves rule and strong Wolfe step size control satisfies gradfx k, ζ k c gradfx k, gradfx k c. 4 c Theorem. Let {X k } be the sequence generated by NRG with the strong Wolfe line search, where 0 < c < c < /, we have lim k inf gradfx k = 0. 3. Proof of Lemma in the Paper The step size τ t is determined such that Since H t, H t = 0, it follows that fr X τ t H t fx t τ t Ht, H t. fx t fx t τ t Ht τ t Ht fx t τ t Ht, where the equality holds when H t = 0, which happens if we solve the master problem exactly. 4. Proof of Theorem in the Paper 4.. Key Lemmas To complete the proof of Theorem, we first recall a property of the orthogonal projection P TX M r Z. Lemma 3. Given the orthogonal projection P TX M r Z = P U ZP V + P U ZP V + P U ZP V, we have rankp T X M r Z minrankz, rankp U for any X. In addition, we have P TX M r Z F Z F for any Z R m n.

Riemannian Pursuit for Big Matrix Recovery Proof. According to Shalit et al., 0, the three terms in P TX M r Z are orthogonal to each other. Since rankp U = rankp V, we have rankp TX M r Z = rankzp V + P U ZP V minrankz, rankp V + minrankz, rankp U = minrankz, rankp U. The relation P TX M r Z F Z F follows immediately form the fact that P TX M r is an orthogonal projection for the Frobenius norm. 4.. Notation We first introduce some notation. First, let X and e be the ground-truth low-rank matrix and additive noise, respectively. Moreover, let {X t } be the sequence generated by RP, ξ t = AX t b and G t = A ξ t. In RP, we solve the fixed-rank subproblem approximately by the NRG method. Recall the definition of the orthogonal projection onto the tangent space of X = USV T P TX M r Z := P TX Z = P U ZP V + P U ZP V + P U ZP V = P U Z + ZP V P U ZP V, where P U = UU T and P V = VV T. In addition, denote the projection P T X as the complement of P TX as P T X = I P U ZI P V. Now recalling E t = P TX t M tρ G t = P TX t M tρ A ξ t, we have X t, A ξ t = X t, E t. 5 At the tth iteration, rankx t = tρ, thus X t X is at most of rank r + tρ where r = rank X. By the orthogonal projection P TX, we decompose X into two mutually orthogonal matrices Based on the above decomposition, it follows that X = X t + X t, where X t = P TX t X and X t = P T X t X. 6 X t X t, X t = 0. Without loss of generality, we assume that r tρ. According to Lemma 3, we have rank X t minrank X, rankp U = tρ and rank X t r. Moreover, since the column and row space of X t is contained in X t = P TX t X, we have rankx t P TX X tρ. Note that, at the t + th iteration of RP, we increase the rank of X t by ρ by performing a line search using H t = H t + H t, where H t = P TX t G t and H t = U ρ diagσ ρ V T ρ Ran P T X t. For convenience in description, we define an internal variable Z t+ R m n as: Z t+ = X t τ t G t, 7 where τ t is the step size used in 0 in the paper. Based on the decomposition of H t, we decompose Z t+ into Z t+ = Z t+ + Z t+ + Zt+ R, 8

Riemannian Pursuit for Big Matrix Recovery where Z t+ = P TX t Z t+ = X t τ t P TX t G t, Z t+ = P T Xt Z t+ = τ t P T Xt G t, 9 Z t+ R = I P T X t P T Xt Z t+. Observe that from 6, we have X T t X t = X t T Xt = 0. Hence, Ran P TX t Ran P T Xt RanI P TX t P T Xt, 0 which implies that the three matrices from above are mutually orthogonal. Similarly, G t is decomposed into three mutually orthogonal parts G t = G t + G t + G t R, where G t = P TX t G t, G t = P T Xt G t, and G t R = I P TX t P T Xt G t. 4.3. Proof of Theorem The proof of Theorem involves three bounds for fx t in terms of X t, Z t+ and Ht F, respectively. For convenience, we first list these bounds in order to complete the proof of Theorem, and we will leave the detailed proof of the three bounds in Section 4.4. First, the following Lemma gives the bound of fx t in terms of X t. Lemma 4. At the t-th iteration, if γ r+tρ < /, then fx t γ r+tρ + γ r+tρ X t F. The following lemma bounds fx t w.r.t. Z t+. Lemma 5. Suppose E t F is sufficiently small with E t = P TX t G t. For γ r+tρ < / and >, we have Z t+ F τ t γ r+tρ + γ r+tρ fx t, By combining Lemma 4 and 5 from above, we shall show the following bound for fx t w.r.t. H t. Lemma 6. If γ r+tρ <, at the t-th iteration, we have γ r+tρ H t F > ρ r + γ r+tρ Proof of Theorem. By combining Lemma and Lemma 6, we have fx t. fx t+ fx t τ t Ht ρτ t γ r+tρ r fx t. + γ r+tρ The variable τ t is a step size obtained by the line search. There should exist a ζ and ζ = min{τ,, τ ι } such that the above relation holds for each t < ι, where γ r+ιρ < /. Note that γ r+tρ γ r+tρ is decreasing w.r.t. γ r+tρ in 0, /. In addition, since γ r+tρ γ r+ιρ holds for all t ι, the following relation holds if γ r+ιρ < / and t < ι, fx t+ ρζ r This completes the proof of Theorem. γ r+ιρ + γ r+ιρ fx t.

4.4. Detailed Proof of the Three Bounds 4.4.. KEY LEMMAS Riemannian Pursuit for Big Matrix Recovery To proceed, we need to recall a property of the constant γ r in RIP. Lemma 7. Lemma 3.3 in andès & Plan, 00 For all X P, X rankx P r p, rankx r q, R m n satisfying X P, X = 0, where AX P, AX γ rp+r q X P F X F. In addition, for any two integers r s, then γ r γ s Dai & Milenkovic, 009. Suppose b q = AX, for some X with rankx = r q. Define b p = AX P where X P is the optimal solution of the following problem min X AX AX F, s.t. rankx = r p, X, X = 0. 3 Let b r = b p b q = AX P AX, then the following relation holds. Lemma 8. With b q, b p and b r defined above, if γ maxrp,r q + γ rp+r q, then Proof. Since X, X P = 0, with Lemma 7, we have γ rp+r q b p b q, and 4 γ maxrp,r q γ rp+r q b q b r b q. 5 γ maxrp,r q b T p b q = AX P, AX γ rp+r q X P F X F γ rp+r q b p γrp b q γrq γ rp+r q b p b q. γ maxrp,r q Now we show that b T p b r = 0. Let X P = UdiagσV T. Since X P is the minimizer of 3, σ is also the minimizer to the following problem: min σ b q Dσ, 6 where D = [Au v T,..., Au rp v T r p ]. The Galerkin condition for this linear least-square system states that b T Dσ b q = 0 for any b in the column span of D. Since b p = AX P is included in the span of D, we obtain b T p b r = 0. Recall now b r = b p b q. Then it follows that b T p b q = b T p b p b r = b p. Accordingly, we have γ rp+r q b p b q. γ maxrp,r q Using the reverse triangular inequality, b r = b q b p b q b p, we obtain γ rp+r b r q b q. γ maxrp,r q By the Galerkin condition of 6, we have b q = b r + b p, we obtain γ rp+r q b q b r b q. γ maxrp,r q Finally, the condition γ maxrp,r q +γ rp+r q is for the positiveness of γrp+rq γ maxrp,rq. This completes the proof.

Riemannian Pursuit for Big Matrix Recovery 4.4.. PROOF OF LEMMA 4 Recall 6. Since b = A X + e, we have fxt = AXt b = AX t X e AX t X e = AX t X t A X t e Note that X t X t, X t = 0, and rankx t X t tρ. By applying Lemma 8, where we let b q = A X t and b r = A X t AX P with X P specified below, it follows that fxt min AX A X t e rankx=tρ, X, X t =0 γ r+tρ A γ X t e by Lemma 8 maxtρ, r γ r+tρ γ r γ X t F e by RIP condition maxtρ, r γ r+tρ γ r+tρ X t F e by γ r+tρ γ maxtρ, r γ r γ r+tρ γ r+tρ γ r+tρ X t F e. Recall that fx t f X e. By rearranging the above inequality, we have This completes the proof. 4.4.3. PROOF OF LEMMA 5 fx t γ r+tρ + γ r+tρ X t F. 7 First, we have the following bound of fx t in terms of X t, Z t+ if E t F is sufficiently small. Lemma 9. Suppose that X t is an approximate solution with E t = P TX t G t. For γ r+tρ < / and E t F sufficiently small, the following inequality holds at the t-th iteration: τ t X t, Z t+ fxt. 8 Proof. Recall the decomposition and let ξ t = AX t b. Then it follows that A X T ξ t = X, A ξ t = [ ] [ ] G t Xt X t, G t by 0 = X t, E t X t, G t by = X t, E t + τ X t, Z t+. by 9 9 t

Riemannian Pursuit for Big Matrix Recovery Therefore, we have fx t = AXt b = AXt T ξ t bt ξ t = Xt, A ξ t bt ξ t = E t, X t A X + e T ξ t by 5 = E t, X t A X T ξ t et ξ t = E t, X t + τ t X t, Z t+ X t, E t et ξ t. by 9 Based on the assumption fx t f X = e, where >, we then have e T ξ t e ξt fxt fx t = fx t. It follows that τ t X t, Z t+ = fxt + et ξ E t, X t + X t, E t fx t E t, X t + X t, E t. 0 Suppose E t, X t X t ϑfx t for ϑ > 0, then it follows that τ t X t, Z t+ fx t ϑfx t ϑfx t. Suppose ϑ, we can simplify the formulation by absorbing ϑ into as Let := Ĉ, where >, and we complete the proof. Proof of Lemma 5. Based on Lemma 9, we have Furthermore, with Lemma 4, it follows that The proof is completed. τ X t, Z t+ fx t Ĉ t. τ t X t F Z t+ F τ t X t, Z t+ fx t. Z t+ F 4τ t fx t X t F τ t γ r+tρ + γ r+tρ fx t. 3

4.4.4. PROOF OF LEMMA 6 Riemannian Pursuit for Big Matrix Recovery Recall that H t is obtained by the truncated SVD of rank ρ on Ĝt = PT Gt X, and r t q = rankz t+ r, where Z t+ = τ tg t = τ tp T Xt Ĝ. Let H be the truncated SVD of rank ρ on G t = P T Xt G t. Since Ran P T Xt Ran P T X t, we have H F H t F. Accordingly, if ρ r q, we have Ht F ρ H F ρ Zt+ F. It follows from Lemma 5 that τt rq H t F ρ γ r+tρ r q fx t. + γ r+tρ Otherwise, if ρ > r q, we have H t F Zt+ F τ t, and the following inequality holds H t γ r+tρ F fx t. + γ r+tρ In summary, since r q r, we have H t ρ r References γ r+tρ + γ r+tρ fx t. The proof is completed. andès, E. J. and Plan, Y. Tight oracle bounds for low-rank matrix recovery from a minimal number of random measurements. IEEE Trans. on Inform. Theory, 574:34 359, 00. Dai, W. and Milenkovic, O. Subspace pursuit for compressive sensing signal reconstruction. IEEE Trans. Inform. Theory, 555:30 49, 009. Donoho, D. L., Tsaig, Y., Drori, I., and Starck, J. L. Sparse solution of underdetermined systems of linear equations by stagewise orthogonal matching pursuit. IEEE Trans. Info. Theory, 58:094, 0. Ring, W. and Wirth, B. Optimization methods on Riemannian manifolds and their application to shape space. SIAM J. Optim., :596 67, 0. Sato, H. and Iwai, T. A new, globally convergent Riemannian conjugate gradient method. Optimization: A Journal of Mathematical Programming and Operations Research, ahead-of-print:, 03. Shalit, U., Weinshall, D., and hechik, G. Online learning in the embedded manifold of low-rank matrices. JMLR, 3: 49 458, 0. Vandereycken, B. Low-rank matrix completion by Riemannian optimization. SIAM J. Optim., 3:4 36, 03.