Supplementary Materials for Riemannian Pursuit for Big Matrix Recovery Mingkui Tan, School of omputer Science, The University of Adelaide, Australia Ivor W. Tsang, IS, University of Technology Sydney, Australia Li Wang, Department of Mathematics, University of alifornia San Diego, USA Bart Vandereycken, Department of Mathematics, Princeton University, USA Sinno Jialin Pan, Institute for Infocomm Research, Singapore TANMINGKUI@GMAIL.OM IVOR.TSANG@GMAIL.OM LIW0@USD.EDU BARTV@PRINETON.EDU JSPAN@IR.A-STAR.EDU.SG Abstract In this supplementary file, we first present the parameter setting for ρ, and then present the proof of the lemmas and theorems appeared in the main paper.. Parameter Setting for ρ In the main paper, we present 4 as a simple and effective method to choose ρ, which is motivated by the thresholding strategy in StOMP for sparse signal recovery Donoho et al., 0. Specifically, let σ be the singular vector of A b, where σ i is arranged in descending order, we choose ρ such that where η 0.60 is usually a good choice. σ i ησ, i ρ, However, it is not trivial to predict the number of singular values that satisfy for big matrices if we do not want to compute a full SVD. Since ρ in general is small, we propose to compute σ i sequentially until condition is violated. Let B be a small integer. We propose to compute B singular values per iteration. Basically, if σ i > ησ where i, we can compute the singular values σ i+,..., σ i+b by performing a rank B truncated SVD on A i = A b i j= σ ju j vj T using PROPAK. In practice, we suggest setting B. The schematic of the Sequential Truncated SVD for Setting ρ is presented in Algorithm. Notice that, PROPAK involves only matrix-vector product with A i and A T i which can be calculated as UdiagσV T r by UdiagσV T r for the low-rank term in A i. We remark that instead of Algorithm, a more efficient technique may involve restarting the Krylov-based method, like PROPAK, with an increasingly larger subspace until is satisfied. Algorithm Sequential Truncated SVD for Setting ρ. : Given η and A b, initialize ρ = and B >. : Do the rank- truncated SVD on A b, obtaining σ R, U R m and V R n. 3: If σ ρ ησ, stop and return ρ =. 4: while σ ρ < ησ do 5: Let ρ = ρ + B. 6: Do the rank-b truncated SVD on A b UdiagσV T, obtaining σ B R B, U B R m B and V B R n B. 7: Let U = [U U B ], V = [V V B ] and σ = [σ σ B ]. 8: end while 9: Return ρ. Proceedings of the 3 st International onference on Machine Learning, Beijing, hina, 04. JMLR: W&P volume 3. opyright 04 by the authors.
. Main Theoretical Results in the Paper Riemannian Pursuit for Big Matrix Recovery We first repeat the main results in the paper before we prove Lemma and Theorem the other results were already proven in the paper. Proposition. In M, suppose the observed entry set Ξ is sampled according to the Bernoulli model with each entry i, j Ξ being independently drawn from a probability p. There exists a constant > 0, for all γ r 0,, µ B, n m 3, if p µ B r logn/γ r m, the following RIP condition holds γ r p X F P Ξ X F + γ r p X F, for any µ B -incoherent matrix X R m n of rank at most r with probability at least exp n log n. Lemma. Let {X t } be the sequence generated by RP, then where τ t satisfies condition in 0 of the paper. fx t fx t τt Ht. 3 Theorem. Let {X t } be the sequence generated by RP and ζ = min{τ,, τ ι }. As long as fx t e where > and if there exists an integer ι > 0 such that γ r+ιρ <, then RP decreases linearly in objective values when t < ι, namely fx t+ νfx t, where ν = ρζ γ r+ιρ r. + γ r+ιρ Proposition Sato & Iwai 03. Given the retraction 8 and vector transport 0 on M r in the paper, there exists a step size θ k that satisfies the strong Wolfe conditions 7 and 8 of the paper. Lemma. If c <, then the search direction ζ k generated by NRG with Fletcher-Reeves rule and strong Wolfe step size control satisfies gradfx k, ζ k c gradfx k, gradfx k c. 4 c Theorem. Let {X k } be the sequence generated by NRG with the strong Wolfe line search, where 0 < c < c < /, we have lim k inf gradfx k = 0. 3. Proof of Lemma in the Paper The step size τ t is determined such that Since H t, H t = 0, it follows that fr X τ t H t fx t τ t Ht, H t. fx t fx t τ t Ht τ t Ht fx t τ t Ht, where the equality holds when H t = 0, which happens if we solve the master problem exactly. 4. Proof of Theorem in the Paper 4.. Key Lemmas To complete the proof of Theorem, we first recall a property of the orthogonal projection P TX M r Z. Lemma 3. Given the orthogonal projection P TX M r Z = P U ZP V + P U ZP V + P U ZP V, we have rankp T X M r Z minrankz, rankp U for any X. In addition, we have P TX M r Z F Z F for any Z R m n.
Riemannian Pursuit for Big Matrix Recovery Proof. According to Shalit et al., 0, the three terms in P TX M r Z are orthogonal to each other. Since rankp U = rankp V, we have rankp TX M r Z = rankzp V + P U ZP V minrankz, rankp V + minrankz, rankp U = minrankz, rankp U. The relation P TX M r Z F Z F follows immediately form the fact that P TX M r is an orthogonal projection for the Frobenius norm. 4.. Notation We first introduce some notation. First, let X and e be the ground-truth low-rank matrix and additive noise, respectively. Moreover, let {X t } be the sequence generated by RP, ξ t = AX t b and G t = A ξ t. In RP, we solve the fixed-rank subproblem approximately by the NRG method. Recall the definition of the orthogonal projection onto the tangent space of X = USV T P TX M r Z := P TX Z = P U ZP V + P U ZP V + P U ZP V = P U Z + ZP V P U ZP V, where P U = UU T and P V = VV T. In addition, denote the projection P T X as the complement of P TX as P T X = I P U ZI P V. Now recalling E t = P TX t M tρ G t = P TX t M tρ A ξ t, we have X t, A ξ t = X t, E t. 5 At the tth iteration, rankx t = tρ, thus X t X is at most of rank r + tρ where r = rank X. By the orthogonal projection P TX, we decompose X into two mutually orthogonal matrices Based on the above decomposition, it follows that X = X t + X t, where X t = P TX t X and X t = P T X t X. 6 X t X t, X t = 0. Without loss of generality, we assume that r tρ. According to Lemma 3, we have rank X t minrank X, rankp U = tρ and rank X t r. Moreover, since the column and row space of X t is contained in X t = P TX t X, we have rankx t P TX X tρ. Note that, at the t + th iteration of RP, we increase the rank of X t by ρ by performing a line search using H t = H t + H t, where H t = P TX t G t and H t = U ρ diagσ ρ V T ρ Ran P T X t. For convenience in description, we define an internal variable Z t+ R m n as: Z t+ = X t τ t G t, 7 where τ t is the step size used in 0 in the paper. Based on the decomposition of H t, we decompose Z t+ into Z t+ = Z t+ + Z t+ + Zt+ R, 8
Riemannian Pursuit for Big Matrix Recovery where Z t+ = P TX t Z t+ = X t τ t P TX t G t, Z t+ = P T Xt Z t+ = τ t P T Xt G t, 9 Z t+ R = I P T X t P T Xt Z t+. Observe that from 6, we have X T t X t = X t T Xt = 0. Hence, Ran P TX t Ran P T Xt RanI P TX t P T Xt, 0 which implies that the three matrices from above are mutually orthogonal. Similarly, G t is decomposed into three mutually orthogonal parts G t = G t + G t + G t R, where G t = P TX t G t, G t = P T Xt G t, and G t R = I P TX t P T Xt G t. 4.3. Proof of Theorem The proof of Theorem involves three bounds for fx t in terms of X t, Z t+ and Ht F, respectively. For convenience, we first list these bounds in order to complete the proof of Theorem, and we will leave the detailed proof of the three bounds in Section 4.4. First, the following Lemma gives the bound of fx t in terms of X t. Lemma 4. At the t-th iteration, if γ r+tρ < /, then fx t γ r+tρ + γ r+tρ X t F. The following lemma bounds fx t w.r.t. Z t+. Lemma 5. Suppose E t F is sufficiently small with E t = P TX t G t. For γ r+tρ < / and >, we have Z t+ F τ t γ r+tρ + γ r+tρ fx t, By combining Lemma 4 and 5 from above, we shall show the following bound for fx t w.r.t. H t. Lemma 6. If γ r+tρ <, at the t-th iteration, we have γ r+tρ H t F > ρ r + γ r+tρ Proof of Theorem. By combining Lemma and Lemma 6, we have fx t. fx t+ fx t τ t Ht ρτ t γ r+tρ r fx t. + γ r+tρ The variable τ t is a step size obtained by the line search. There should exist a ζ and ζ = min{τ,, τ ι } such that the above relation holds for each t < ι, where γ r+ιρ < /. Note that γ r+tρ γ r+tρ is decreasing w.r.t. γ r+tρ in 0, /. In addition, since γ r+tρ γ r+ιρ holds for all t ι, the following relation holds if γ r+ιρ < / and t < ι, fx t+ ρζ r This completes the proof of Theorem. γ r+ιρ + γ r+ιρ fx t.
4.4. Detailed Proof of the Three Bounds 4.4.. KEY LEMMAS Riemannian Pursuit for Big Matrix Recovery To proceed, we need to recall a property of the constant γ r in RIP. Lemma 7. Lemma 3.3 in andès & Plan, 00 For all X P, X rankx P r p, rankx r q, R m n satisfying X P, X = 0, where AX P, AX γ rp+r q X P F X F. In addition, for any two integers r s, then γ r γ s Dai & Milenkovic, 009. Suppose b q = AX, for some X with rankx = r q. Define b p = AX P where X P is the optimal solution of the following problem min X AX AX F, s.t. rankx = r p, X, X = 0. 3 Let b r = b p b q = AX P AX, then the following relation holds. Lemma 8. With b q, b p and b r defined above, if γ maxrp,r q + γ rp+r q, then Proof. Since X, X P = 0, with Lemma 7, we have γ rp+r q b p b q, and 4 γ maxrp,r q γ rp+r q b q b r b q. 5 γ maxrp,r q b T p b q = AX P, AX γ rp+r q X P F X F γ rp+r q b p γrp b q γrq γ rp+r q b p b q. γ maxrp,r q Now we show that b T p b r = 0. Let X P = UdiagσV T. Since X P is the minimizer of 3, σ is also the minimizer to the following problem: min σ b q Dσ, 6 where D = [Au v T,..., Au rp v T r p ]. The Galerkin condition for this linear least-square system states that b T Dσ b q = 0 for any b in the column span of D. Since b p = AX P is included in the span of D, we obtain b T p b r = 0. Recall now b r = b p b q. Then it follows that b T p b q = b T p b p b r = b p. Accordingly, we have γ rp+r q b p b q. γ maxrp,r q Using the reverse triangular inequality, b r = b q b p b q b p, we obtain γ rp+r b r q b q. γ maxrp,r q By the Galerkin condition of 6, we have b q = b r + b p, we obtain γ rp+r q b q b r b q. γ maxrp,r q Finally, the condition γ maxrp,r q +γ rp+r q is for the positiveness of γrp+rq γ maxrp,rq. This completes the proof.
Riemannian Pursuit for Big Matrix Recovery 4.4.. PROOF OF LEMMA 4 Recall 6. Since b = A X + e, we have fxt = AXt b = AX t X e AX t X e = AX t X t A X t e Note that X t X t, X t = 0, and rankx t X t tρ. By applying Lemma 8, where we let b q = A X t and b r = A X t AX P with X P specified below, it follows that fxt min AX A X t e rankx=tρ, X, X t =0 γ r+tρ A γ X t e by Lemma 8 maxtρ, r γ r+tρ γ r γ X t F e by RIP condition maxtρ, r γ r+tρ γ r+tρ X t F e by γ r+tρ γ maxtρ, r γ r γ r+tρ γ r+tρ γ r+tρ X t F e. Recall that fx t f X e. By rearranging the above inequality, we have This completes the proof. 4.4.3. PROOF OF LEMMA 5 fx t γ r+tρ + γ r+tρ X t F. 7 First, we have the following bound of fx t in terms of X t, Z t+ if E t F is sufficiently small. Lemma 9. Suppose that X t is an approximate solution with E t = P TX t G t. For γ r+tρ < / and E t F sufficiently small, the following inequality holds at the t-th iteration: τ t X t, Z t+ fxt. 8 Proof. Recall the decomposition and let ξ t = AX t b. Then it follows that A X T ξ t = X, A ξ t = [ ] [ ] G t Xt X t, G t by 0 = X t, E t X t, G t by = X t, E t + τ X t, Z t+. by 9 9 t
Riemannian Pursuit for Big Matrix Recovery Therefore, we have fx t = AXt b = AXt T ξ t bt ξ t = Xt, A ξ t bt ξ t = E t, X t A X + e T ξ t by 5 = E t, X t A X T ξ t et ξ t = E t, X t + τ t X t, Z t+ X t, E t et ξ t. by 9 Based on the assumption fx t f X = e, where >, we then have e T ξ t e ξt fxt fx t = fx t. It follows that τ t X t, Z t+ = fxt + et ξ E t, X t + X t, E t fx t E t, X t + X t, E t. 0 Suppose E t, X t X t ϑfx t for ϑ > 0, then it follows that τ t X t, Z t+ fx t ϑfx t ϑfx t. Suppose ϑ, we can simplify the formulation by absorbing ϑ into as Let := Ĉ, where >, and we complete the proof. Proof of Lemma 5. Based on Lemma 9, we have Furthermore, with Lemma 4, it follows that The proof is completed. τ X t, Z t+ fx t Ĉ t. τ t X t F Z t+ F τ t X t, Z t+ fx t. Z t+ F 4τ t fx t X t F τ t γ r+tρ + γ r+tρ fx t. 3
4.4.4. PROOF OF LEMMA 6 Riemannian Pursuit for Big Matrix Recovery Recall that H t is obtained by the truncated SVD of rank ρ on Ĝt = PT Gt X, and r t q = rankz t+ r, where Z t+ = τ tg t = τ tp T Xt Ĝ. Let H be the truncated SVD of rank ρ on G t = P T Xt G t. Since Ran P T Xt Ran P T X t, we have H F H t F. Accordingly, if ρ r q, we have Ht F ρ H F ρ Zt+ F. It follows from Lemma 5 that τt rq H t F ρ γ r+tρ r q fx t. + γ r+tρ Otherwise, if ρ > r q, we have H t F Zt+ F τ t, and the following inequality holds H t γ r+tρ F fx t. + γ r+tρ In summary, since r q r, we have H t ρ r References γ r+tρ + γ r+tρ fx t. The proof is completed. andès, E. J. and Plan, Y. Tight oracle bounds for low-rank matrix recovery from a minimal number of random measurements. IEEE Trans. on Inform. Theory, 574:34 359, 00. Dai, W. and Milenkovic, O. Subspace pursuit for compressive sensing signal reconstruction. IEEE Trans. Inform. Theory, 555:30 49, 009. Donoho, D. L., Tsaig, Y., Drori, I., and Starck, J. L. Sparse solution of underdetermined systems of linear equations by stagewise orthogonal matching pursuit. IEEE Trans. Info. Theory, 58:094, 0. Ring, W. and Wirth, B. Optimization methods on Riemannian manifolds and their application to shape space. SIAM J. Optim., :596 67, 0. Sato, H. and Iwai, T. A new, globally convergent Riemannian conjugate gradient method. Optimization: A Journal of Mathematical Programming and Operations Research, ahead-of-print:, 03. Shalit, U., Weinshall, D., and hechik, G. Online learning in the embedded manifold of low-rank matrices. JMLR, 3: 49 458, 0. Vandereycken, B. Low-rank matrix completion by Riemannian optimization. SIAM J. Optim., 3:4 36, 03.