Guaranteed Rank Minimization via Singular Value Projection: Supplementary Material

Size: px

Start display at page:

Download "Guaranteed Rank Minimization via Singular Value Projection: Supplementary Material"

Juliet Sherman
5 years ago
Views:

1 Guaranteed Rank Minimization via Singular Value Projection: Supplementary Material Prateek Jain Microsoft Research Bangalore Bangalore, India Raghu Meka UT Austin Dept. of Computer Sciences Austin, TX, USA Inderjit Dhillon UT Austin Dept. of Computer Sciences Austin, TX, USA Appendix A: Analysis for Affine Constraints Satisfying RIP We first give a proof of Lemma 2. which bounds the error of the (t + )-st iterate (ψ(x t+ )) in terms of the error incurred by the t-th iterate and the optimal solution. Proof of Lemma 2. Recall that ψ(x) = 2 A(X) b 2 2. Since ψ( ) is a quadratic function, we have ψ(x t+ ) ψ(x t ) = ψ(x t ), X t+ X t + 2 A(Xt+ X t ) 2 2 A T (A(X t ) b), X t+ X t + 2 ( + δ 2k) X t+ X t 2 F, (0.) where the inequality follows from RIP applied to the matrix X t+ X t of rank at most 2k. Let Y t+ = X t +δ 2k A T (A(X t ) b) and Now, f t (X) = A T (A(X t ) b), X X t + 2 ( + δ 2k) X X t 2 F. f t (X) = 2 ( + δ 2k) [ A X X t 2 T (A(X t ] ) b) F + 2, X X t + δ 2k = 2 ( + δ 2k) X Y t+ 2 F 2( + δ 2k ) AT (A(X t ) b) 2 F. Thus, by definition, P k (Y t+ ) = X t+ is the minimizer of f t (X) over all matrices X C(k) (of rank at most k). In particular, f t (X t+ ) f t (X ) and, ψ(x t+ ) ψ(x t ) f t (X t+ ) f t (X ) = A T (A(X t ) b), X X t + 2 ( + δ 2k) X X t 2 F A T (A(X t ) b), X X t δ 2k A(X X t ) 2 2 (0.2) δ 2k = ψ(x ) ψ(x t δ 2k ) + ( δ 2k ) A(X X t ) 2 2, where inequality (0.2) follows from RIP applied to X X t. We now prove Theorem.2.

2 Proof of Theorem.2 Let the current solution X t satisfy ψ(x t ) G e 2 /2, where G 0 is a universal constant. Using Lemma 2. and the fact that b A(X ) = e, δ 2k ψ(x t+ ) e ( δ 2k ) b A(Xt ) e 2 2 e δ 2k 2 ( δ 2k ) (ψ(xt ) e T (b A(X t )) + e 2 /2) ψ(xt ) G 2 + 2δ ( 2k ψ(x t ) + 2 ( δ 2k ) G ψ(xt ) + ) G 2 ψ(xt ) Dψ(X t ), ( where D = G + 2δ ( ) ) 2 2k ( δ 2k ) + 2 G. Recall that δ 2k < /3. Hence, selecting G > ( + δ 2k )/( 3δ 2k ), we get D <. Also, ψ(x 0 ) = ψ(0) = b 2 /2. Hence, ψ(x τ ) C e 2 + ϵ where τ = log(/d) log b 2 2(C e 2 +ϵ) and C = G 2 /2. Appendix B: Proof of Weak RIP for Matrix Completion We now give a detailed proof of the weak RIP property for the matrix completion problem, Theorem 2.2. Proof of Lemma 2.4 Let X = UΣV T be the singular value decomposition of X. Then, Since X is µ-incoherent, X ij l= X ij = e T i UΣV T e j = Σ ll U il V jl µ mn ( l= U il Σ ll V jl l= Σ ll U il V jl. l= Σ ll ) µ mn k ( We also need the following technical lemma for our proof. Lemma 0. Let a, b, c, x, y, z [, ]. Then, l= abc xyz a x + b y + c z. Σ 2 ll) /2 = µ k mn X F. Proof We have, abc xyz = (abc xbc) + (xbc xyc) + (xyc xyz) a x bc + b y xc + c z xy a x + b y + c z. We now prove Lemma 2.5. We use the following form of the classical Bernstein s large deviation inequality in our proof. Lemma 0.2 (Bernstein s Inequality, see [3]) Let X, X 2,..., X n be independent random variables with E[X i ] = 0 and X i M for all i. Then, P [ ( t 2 ) /2 X i > t] exp i i V ar(x. i) + Mt/3 Proof of Lemma 2.5 For (i, j) [m] [n], let ω ij be the indicator variables with ω ij = if (i, j) Ω and 0 otherwise. Then, ω ij are independent random variables with Pr[ω ij = ] = p. Let random variable Z ij = ω ij Xij 2. Note that, E[Z ij ] = px 2 ij, V ar(z ij ) = p( p)x 4 ij. Since X is α-regular, Z ij E[Z ij ] X ij 2 (α 2 /mn) X 2 F. Thus, M = max Z ij E[Z ij ] α2 mn X 2 F. (0.3) 2

3 Now, define random variable S = Z ij = ω ijxij 2 = P Ω(X) 2 F. Note that, E[S] = p X 2 F. Since, Z ij are independent random variables, V ar(s) = p( p) X 4 ij p (max X 2 ij) X 2 ij p α2 mn X 4 F. (0.4) Using Bernstein s inequality (Lemma 0.2) for S with t = δp X 2 F and Equations (0.3) and (0.4) we get, ( t 2 ) /2 Pr[ S E[S] > t] 2 exp V ar(z) + Mt/3 ( ) 2 exp δ2 pmn α 2 ( + δ/3) ( ) 2 exp δ2 pmn 3α 2. Proof of Lemma 2.6 We construct S(µ, ϵ) by discretizing the space of low-rank incoherent matrices. Let ρ = ϵ/ 9k 2 mn and D(ρ) = {ρ i : i Z, i < /ρ }. Let U(ρ) = {U R m k : U ij ( µ/m) D(ρ) }, V (ρ) = {V R n k : V ij ( µ/n) D(ρ) }, Σ(ρ) = {Σ R k k : Σ ij = 0, i j, Σ ii D(ρ)}, S(µ, ϵ) = { UΣV T : U U(ρ), Σ Σ(ρ), V V (ρ) }. We will show that S(µ, ϵ) satisfies the conditions of the Lemma. Observe that D(ρ) < 2/ρ. Thus, U(ρ) < (2/ρ) mk, V (ρ) < (2/ρ) nk, Σ(ρ) < (2/ρ) k. Hence, S(µ, ϵ) < (2/ρ) mk+nk+k < (mnk/ϵ) 3(m+n)k. Fix a µ-incoherent X R m n of rank at most k with X 2 =. Let X = UΣV T be the singular value decomposition of X. Let U be the matrix obtained by rounding entries of U to integer multiples of µ ρ/ m as follows: for (i, l) [m] [k], let (U ) il = µρ m m U il. µ ρ Now, since U il µ/ m, it follows that U U(ρ). Further, for all i [m], l [k], µ (U ) il U il < ρ ρ. m Similarly, define V, Σ by rounding entries of V, Σ to integer multiples of µ ρ/ n and ρ respectively. Then, V V (ρ), Σ Σ(ρ) and for (j, l) [n] [k], µρ (V ) jl V jl < ρ, (Σ ) ll Σ ll < ρ. n Let X(ρ) = U Σ V T. Then, by the above equations and Lemma 0., for i [m], l [k], j [n], Thus, for i, j [m] [n], (U ) il (Σ ) ll (V ) jl U il Σ ll V jl < 3ρ. X(ρ) ij X ij = (U ) il (Σ ) ll (V ) jl U il Σ ll V jl l= (U ) il (Σ ) ll (V ) jl U il Σ ll V jl l= < 3kρ. (0.5) 3

4 Using Lemma 2.4 and Equation (0.5) Also, using (0.5), max X(ρ) ij < max X ij + 3kρ µ k X F + mn ϵ mn. X(ρ) X 2 F = X(ρ) ij X ij 2 < 9k 2 mnρ 2 = ϵ 2. Further, by triangle inequality, X(ρ) F > X F ϵ > X F /2. Since, ϵ < and µ k X F, max X(ρ) ij < 2µ k X F < 4µ k X(ρ) F. mn mn Thus, X(ρ) is 4µ k-regular. The lemma now follows by taking Y = X(ρ). We now prove Theorem 2.2 by combining Lemmas 2.5 and 2.6. Proof of Theorem 2.2 Let m n, ϵ = δ/9mnk and S (µ, ϵ) = {Y : Y S(µ, ϵ), Y is 4µ k-regular}, where S(µ, ϵ) is as in Lemma 2.6. Then, by Lemma 2.5 and a union bound, Pr [ P Ω (Y ) 2 F p Y 2 F δp Y 2 F for some Y S (µ, ϵ) ] ( ) 3(m+n)k ( mnk δ 2 ) pmn 2 exp ϵ 6µ 2 k ( δ 2 ) pmn exp(c nk log n) exp 6µ 2, k where C 0 is a constant independent of m, n, k. Thus, if p > Cµ 2 k 2 log n/δ 2 m, where C = 6(C +), with probability at least exp( n log n), the following holds: Y S (µ, ϵ), P Ω (Y ) 2 F p Y 2 F δp Y 2 F. (0.6) As the statement of the theorem is invariant under scaling, it is enough to show the statement for all µ-incoherent matrices X of rank at most k and X 2 =. Fix such a X and suppose that (0.6) holds. Now, by Lemma 2.6 there exists Y S (µ, ϵ) such that Y X F ϵ. Moreover, Y 2 F ( X F + ϵ) 2 X 2 F + 2ϵ X F + ϵ 2 X 2 F + 3ϵk. Proceeding similarly, we can show that X 2 F Y 2 F 3ϵk. (0.7) Further, starting with P Ω (Y X) F Y X F ϵ and arguing as above we get that Combining inequalities (0.7), (0.8) above, we have P Ω (Y ) 2 F P Ω (X) 2 F 3ϵk. (0.8) P Ω (X) 2 F p X 2 F P Ω (X) 2 F P Ω (Y ) 2 F + p X 2 F Y 2 F + P Ω (Y ) 2 F p Y 2 F The theorem now follows. 6ϵk + δp Y 2 F from (0.6), (0.7), (0.8) 6ϵk + δp( X 2 F + 3ϵk) from (0.7) 9ϵk + δp X 2 F 2δp X 2 F. since X 2 F 4

5 Appendix C: SVP-Newton The affine rank minimization problem is a natural generalization to matrices of the following compressed sensing problem (CSP) for vectors: min x x 0, s.t. Ax = b, (0.9) where x 0 is the l 0 norm (size of the support) of x R n, A R m n is the sensing matrix and b R m are the measurements. Similar to ARMP, the compressed sensing problem is also NP-hard in general. However, similar to ARMP, compressed sensing can also be solved for measurement matrices that satisfies restricted isometry property. Restricted Isometry Property (RIP) for the Compressed Sensing problem is similar to the corresponding definition for matrices: ( δ k ) x 2 2 Ax 2 2 ( + δ k ) x 2 2, (0.0) where support(x) k. Note that support of a vector is same as the rank of the corresponding diagonal matrix whose diagonal is given by the same vector. Since, Compressed Sensing is a special case of Affine Rank Minimization Problem (ARMP) with diagonal matrices, i.e., with fixed basis vectors U k = I k, V k = I k (see step 4, SVP algorithm). Hence, SVP-Newton can be directly applied to Problem 0.9. The corresponding updates for SVP-Newton applied to compressed sensing are given by: y t+ x t η t A T (Ax b), (0.) S Set of top k elements ofy t+, (0.2) x t+ S argmin A S x S b 2 2, (0.3) x S We now present a Theorem that shows that the SVP-Newton method applied to the Compressed Sensing problem converges to the optimal solution in O(log k) iterations where k is the optimal support. Theorem 0.3 Suppose the isometry constant of A in CSP satisfies δ 2k < /3. Let b = Ax for a support-k vector x, i.e., x 0 = k and b i, i. Then, SVP-Newton algorithm for CSP (Updates 0.3) with step-size η t = 4/3 converges to the exact x 2 in at most log /α log α iterations, where c = min i x i and α = /3+δ 2k δ 2k. k δ2k c Proof Assume that x 0 = 0. Now, using Lemma 2. specialized to CSP (or see Theorem 2. in Garg and Khandekar []), ( δ 2k ) x t x Ax t b 2 α t b 2, where α = /3+δ 2k δ 2k. Let us assume that the support set S discovered by SVP-Newton at the t-th step differs from the optimal set S (support set for x ) on at least one element. Now, note that if the support set S discovered by SVP-Newton is the optimal set S (support set for x ) then x t = x as x t minimizes A S x S b 2 2. Since, S S ϕ, x t x 2 c 2. Hence, ( δ 2k )c 2 α t b 2, implying, t 2 log /α log k α δ2k c. Note that SVP-Newton converges to the optimal solution in logarithmic number of queries rather than ϵ-approximate solution obtained by GradeS [] which is just a specialization of SVP to the compressed sensing problem. Note that the number of iterations required is an improvement over 5

6 O(k) bound provided by [2]. [2] requires incoherence property for the measurement matrix, which is a stronger condition to RIP. In fact, one can construct measurement matrices A with bounded RIP constant but unbounded incoherence constant. References [] Rahul Garg and Rohit Khandekar. Gradient descent with sparsification: an iterative algorithm for sparse recovery with restricted isometry property. In ICML, doi: 0.45/ [2] Arian Maleki. Coherence analysis of iterative thresholding algorithms. CoRR, abs/ , [3] Wikipedia. Bernstein inequalities (probability theory) wikipedia, the free encyclopedia, URL inequalities_(probability_theory)&oldid= [Online; accessed 6- April-2009]. Note that incoherence property mentioned here is different than the one used in the context of Matrix Completion 6

Guaranteed Rank Minimization via Singular Value Projection

3 4 5 6 7 8 9 3 4 5 6 7 8 9 3 4 5 6 7 8 9 3 3 3 33 34 35 36 37 38 39 4 4 4 43 44 45 46 47 48 49 5 5 5 53 Guaranteed Rank Minimization via Singular Value Projection Anonymous Author(s) Affiliation Address