TRANSFORMED SCHATTEN-1 ITERATIVE THRESHOLDING ALGORITHMS FOR LOW RANK MATRIX COMPLETION

Size: px

Start display at page:

Download "TRANSFORMED SCHATTEN-1 ITERATIVE THRESHOLDING ALGORITHMS FOR LOW RANK MATRIX COMPLETION"

Olivia Russell
5 years ago
Views:

1 TRANSFORMED SCHATTEN- ITERATIVE THRESHOLDING ALGORITHMS FOR LOW RANK MATRIX COMPLETION SHUAI ZHANG, PENGHANG YIN, AND JACK XIN Abstract. We study a non-convex low-rank promoting penalty function, the transformed Schatten- (TS), and its applications in matrix completion. The TS penalty, as a matrix quasi-norm defined on its singular values, interpolates the rank and the nuclear norm through a nonnegative parameter a (,+ ). We consider the unconstrained TS regularized low-rank matrix recovery problem and develop a fixed point representation for its global minimizer. The TS thresholding functions are in closed analytical form for all parameter values. The TS threshold values differ in subcritical (supercritical) parameter regime where the TS threshold functions are continuous (discontinuous). We propose TS iterative thresholding algorithms and compare them with some state-of-the-art algorithms on matrix completion test problems. For problems with known rank, a fully adaptive TS iterative thresholding algorithm consistently performs the best under different conditions, where ground truth matrices are generated by multivariate Gaussian, (,) uniform and Chi-square distributions. For problems with unknown rank, TS algorithms with an additional rank estimation procedure approach the level of IRucL-q which is an iterative reweighted algorithm, non-convex in nature and best in performance. Key words. Transformed Schatten- penalty, fixed point representation, closed form thresholding function, iterative thresholding algorithms, matrix completion. AMS subject classifications. 9C26, 9C46. Introduction Low rank matrix completion problems arise in many applications such as collaborative filtering in recommender systems [4, 7], minimum order system and low-dimensional Euclidean embedding in control theory [4, 5], network localization [8], and others [26]. The mathematical problem is: min X R m n rank(x) s.t. X L, (.) where L is a convex set. In this paper, we are interested in methods for solving the affine rank minimization problem (ARMP) min X R m n rank(x) s.t. A (X) = b in Rp, (.2) where the linear transformation A : R m n R p and vector b R p are given. The matrix completion problem min X R m n rank(x) s.t. X i,j = M i,j, (i,j) Ω (.3) is a special case of (.2), where X and M are both m n matrices and Ω is a subset of index pairs {(i,j)}. The optimization problems above are known to be NP-hard. Many alternative penalties have been utilized as proxies for finding low rank solutions in both the constrained and unconstrained settings: min X R m n F (X) s.t. A (X) = b (.4) The work was partially supported by NSF grant DMS and DMS Department of Mathematics, University of California, Irvine, CA, 92697, USA. szhang3@uci.edu, penghany@uci.edu, jxin@math.uci.edu. Phone: (949) Fax: (949)

2 2 TS for Low Rank Matrix Completion and min X R m n 2 A (X) b 2 2 +λf (X). (.5) The penalty function F ( ) is defined on singular values of matrix X, typically F (X) = f(σ i ), where σ i is the i-th largest singular value of X arranged in descending order. The Schatten p-norm (nuclear norm at p = ) results when f(x) = x p, i p [,]. At p = (p = 2), F is the rank (Frobenius norm). Recovering rank under suitable conditions for p (,] has been extensively studied in theories and algorithms [2, 3, 4, 9, 2, 2, 23, 24, 28]. Non-convex penalty based methods have shown better performance on hard problems [2, 24]. There is also a novel method to solve the constrained problem (.4), from the perspective of gauge dual [32, 33]. Recently, a class of l based non-convex penalty, the transformed l (TL), has been found effective and robust for compressed sensing problems [3, 3]. TL interpolates l and l, similar to l p quasi-norm (p (,)). In the entire range of interpolation parameter, TL enjoys closed form iterative thresholding function, which is available for l p only at some specific values, like p =,,/2,2/3, see [, 5, 7, 29]. This feature allows TL to perform fast and robust sparse minimization in a much wider range than l p quasi-norm. Moreover, the TL penalty possesses the unbiasedness and Lipschitz continuity besides sparsity [2, 22]. It is the goal of this paper to extend TL penalty to TS (transformed Schatten-) for low rank matrix completion and compare it with state of the art methods in the literature. The rest of the paper is organized as follows. In section 2, we present the transformed Schatten- function (TS), the TS regularized minimization problems, and a derivation of thresholding representation of the global minimum. In section 3, we propose two thresholding algorithms (TS-s and TS-s2) based on a fixed point equation of the global minimum. In section 4, we compare TS algorithms with some stateof-the-art algorithms through numerical experiments in low rank matrix recovery and image inpainting. Concluding remarks are in section 5... Notation Here we set the notations for this paper. Two kinds of inner products are used in the following sections, one is between matrices and the other is a bilinear operation for vectors: (x,y) = x i y i for vectors x,y; i X,Y = tr(y T X) = X i,j Y i,j for matrices X,Y. i,j Assume matrix X R m n has r positive singular values σ σ 2... σ r >. Let us introduce some common matrix norms or quasi-norms as, Nuclear norm: X = r σ i ; i= Schatten p quasi-norm: X p = ( r σ p i )/p, for p (,); Frobenius norm: X F = ( i=σ r i 2) 2. Also X 2 F = X,X = Xi,j 2. i,j Ky Fan k-norm: X F k = k σ i, for k r; i= i=

3 S. Zhang, P. Yin, J. Xin 3 Induced L 2 norm: X L 2 = max v 2= Xv 2 = σ. Define function vec( ) to unfold one matrix columnwise into a vector. So it is clear that vec(x) 2 = X F, where the left hand side norm is vector s l 2 norm. Define the shrinkage identity k matrix I s k Rm n as following, { I s k (i,i) =, the first k diagonal elements; I s k (i,j) =, others. (.6) Operator tr k ( ) is defined as the first k partial trace of a matrix, tr k (X) = k X i,i. (.7) The following matrix functions will be used in the proof of next section, and we want to write them out first here for reference: i= C λ (X) = 2 A (X) b 2 2 +λt (X); C λ,µ (X,Z) = µ { C λ (X) 2 A (X) A } (Z) X Z 2 F = µλt (X)+ µ 2 b 2 2 µ 2 A (Z) 2 2 µ(a (X),b A (Z))+ 2 X Z 2 F ; B µ (Z) = Z +µa (b A (Z)). (.8) 2. TS minimization and thresholding representation First, let us introduce Transformed Schatten- penalty function(ts) based on the singular values of a matrix: rank(x) T (X) = ρ a (σ i ), (2.) where ρ a ( ) is a linear-to-linear rational function with parameter a (, ) [3, 3], i= ρ a ( x ) = (a+) x. (2.2) a+ x With the change of parameter a, TL interpolates l and l norms: lim a + ρ a(x) = I {x }, lim ρ a(x) = x. a + In Fig. 2., level lines of TL on the plane are shown at small and large values of parameter a, resembling those of l (at a = ), l /2 (at a = ), and l (at a =.). We shall focus on TS regularized problem min X R m n 2 A (X) b 2 2 +λt (X), (2.3) where the linear transform A : R m n R p can be determined by p given matrices A,...,A p R m n, that is, A (X) = ( A,X,..., A p,x ) T.

4 4 TS for Low Rank Matrix Completion (a) l (b) TL with a = (c) TL with a = (d) TL with a = Fig. 2.: Level lines of TL with different parameters: a = (figure b), a = (figure c), a =. (figure d). For large parameter a, the graph looks almost the same as l (figure a). While for small value of a, it tends to the axis. 2.. Overview of TL minimization To set the stage for the discussion of the TS regularized problem (2.3), we review the following results on one-dimensional TL optimization [3]. Let us consider the unconstrained TL regularized problem: min x R n 2 Ax y 2 2 +λp a (x), (2.4) where matrix A R m n, vector y R m are given, P a (x) = ρ a ( x i ) and function ρ a ( ) i is as in (2.2). In this subsection of TL minimization, we want to overwrite operator B µ ( ) over vector x, instead of matrix field as before in (.8), B µ (x) = x+µa T (y Ax). (2.5) In the following theorem (2.), we prove that there exists a closed form expression for proximal operator prox λρa on univariate TL regularization problem, where prox λρa (x) = argmin 2 (y x)2 +λρ a (y). y R

5 S. Zhang, P. Yin, J. Xin 5 Proximal operator of a convex function usually intends to solve a small convex regularization problem, which often admits a closed-form formula or an efficient specialized numerical method. However, for non-convex functions, like l p with p (.), their related proximal operators do not have closed form solutions in general. There are many iterative algorithms to approximate optimal solution. But they need more computing time and sometimes only converge to some local optimal or stationary points. In this subsection, we will show that for TL function, there indeed exists a closed-formed formula for its optimal solution. Different with other thresholding operators, TL has 2 threshold value formulas depending on regularization parameter λ and TL parameter a. We present them here with same notation as [3]. { t 2 = λ a+ a (sub-critical parameter) t 3 = 2λ(a+) a 2 (super-critical parameter). (2.6) The inequality t 3 t 2 holds and the equality is realized if and only if λ = a2 2(a+), see [3]. Let sgn( ) be the standard signum function with sgn() =, and { ( ) 2 ϕ(x) h λ (x) = sgn(x) (a+ x ) cos 2a x } (2.7) 3 with ϕ(x) = arccos( 27λa(a+) 2(a+ x ) ). In general, h 3 λ (x) x, see [3]. Theorem 2.. ([3]) The optimal solution of y = argmin{ 2 (y x)2 +λρ a ( y )} is a y R thresholding function of the form: y = {, x t h λ (x), x > t (2.8) where h λ ( ) is defined in (2.7), and the threshold parameter t depends on λ as follows:. if λ a2 2(a+) (sub-critical and critical), 2. if λ > a2 2(a+) (super-critical), t = t 2 = λ a+ a ; t = t 3 = 2λ(a+) a 2. According to the above theorem, we introduce thresholding operator g λ,a ( ) in R, g λ,a (w) = {, if w t; h λ (w), if w > t, (2.9) where t is the thresholding value in Theorem 2. and h λ ( ) in (2.7). In [3], the authors proved that when λ <, the TL thresholding function a2 2(a+) is continuous, same as the soft-thresholding function [8, 9]. While if λ > a2 2(a+), the TL thresholding function has a jump discontinuity at threshold value, similar to the half-thresholding function [29]. For different thresholding schemes, it is believed that

6 6 TS for Low Rank Matrix Completion continuous formula is more stable, while discontinuous formula separates nonzero and trivial coefficients more efficiently and sometimes converges faster. We have the following representation theorem for TL regularized problem (2.4). Theorem 2.2. ([3]) If x = (x,x 2,...,x n) T is a TL regularized solution (2.4) with a and λ being positive constants, and < µ < A 2, then letting t = t 2I { } + λµ a2 2(a+) t 3I { }, the optimal solution satisfies the fixed point equation: λµ> a2 2(a+) x i = g λµ,a ([B µ (x )] i ) i =,...,n. (2.) In the following, we will extend this result to TS low rank matrix completion and propose 2 thresholding algorithms based on it TS thresholding representation theory Here we assume m n. For a matrix X R m n with rank equal to r, its singular values vector σ = (σ,...,σ m ) is arranged as σ σ 2... σ r > = σ r+ =... = σ m. The singular value decomposition (SVD) is X = UDV T, where U = (U i,j ) m m and V = (V i,j ) n n are unitary matrices, with D = Diag(σ) R m n diagonal. In [3], Ky Fan proved the dominance theorem and derive the following Ky Fan k-norm inequality. Lemma 2.. (Ky Fan k-norm inequality) For a matrix X R m n with SVD: X = U DV T, where diagonal elements of D are arranged in decreasing order, we have: X,I s k D,I s k, that is, tr k (X) tr k (D) = X F k, k =,2,...,m. The inequalities become equalities if and only if X = D. Here matrix I s k and operator tr k( ) are defined in section.. Another proof of this inequality without using dominance theorem is available. We leave it in the appendix for readers convenience, making the paper self-contained. Theorem 2.3. Consider matrix Y R m n that admits a singular value decomposition of the form: Y = U Diag(σ)V T, where σ = (σ,...,σ m ). Then the unique global minimizer of min X R m n 2 X Y 2 F +λt (X) is: X s = G λ,a (Y ) = UDiag(g λ,a (σ))v T, (2.) where g λ,a ( ) is defined in (2.9) and applied entrywise to σ. Proof. First due to the unitary invariance property of Frobenius norm and Y = UDiag(σ)V T, we have So 2 X Y 2 F +λt (X) = 2 U T XV Diag(σ) 2 F +λt (U T XV ). X s = argmin 2 X Y 2 F +λt (X) { = U argmin 2 X Diag(σ) 2 F }V +λt (X) T. X R m n X R m n (2.2)

7 S. Zhang, P. Yin, J. Xin 7 Next we want to show: argmin 2 X Diag(σ) 2 F +λt (X) X R m n = argmin {D R m n is diagonal} 2 D Diag(σ) 2 F +λt (D) (2.3) For any X R m n, suppose it admits SVD: X = U x Diag(σ x )V T x. Denote D x = Diag(σ x ) and D y = Diag(σ). We can rewrite diagonal matrix D y as D y = m σ i Ii s, where σ i = σ i σ i+ for i =,2,...,m, and σ m = σ m. So simply, i matrix I s i is defined in section.. So: i m σ i = σ k. Note that the shrinkage identity i=k X,D y = X, m σ i Ii s = m X, σ i Ii s i m D x, σ i Ii s = D x,d y, i where we used Lemma 2. for the inequality. The equality holds if and only if X = D x. Thus we have Also due to T (X) = T (D x ), X D y 2 F = X 2 F + D y 2 F 2 X,D y D x 2 F + D y 2 F 2 D x,d y = D x D y 2 F. 2 X D y 2 F +λt (X) 2 D x D y 2 F +λt (D x ). Only when X = D x is a diagonal matrix, the above will become equality. So we proved equation (2.3). Denote a diagonal matrix D R m n as D = Diag(d). Then: 2 D Diag(σ) 2 F +λt (D) = m i= i { } 2 d i σ i 2 2 +λρ a ( d i ) By Theorem 2., we have g λ,a (σ i ) = argmin{ d 2 d σ i 2 2 +λρ a ( d ) }. It follows that argmin {D R m n and D is diagonal} 2 D Diag(σ) 2 F +λ T (D) = argmin 2 X Diag(σ) 2 F +λ T (X) X R m n = Diag(g λ,a (σ)). (2.4) In view of (2.2), the matrix X s = UDiag(g λ,a (σ))v T is a global minimizer, which will be denoted as G λ,a (Y ). Let X be another optimal solution for the problem min X R m n 2 X Y 2 F +λt (X) and denote X = U T X V. Then X will be an optimal solution of argmin 2 X X R m n Diag(σ) 2 F +λt (X). Based on the above proof and Theorem 2., we have X =

8 8 TS for Low Rank Matrix Completion Diag(g λ,a (σ)). Since U and V are unitary, X = U X V T = X s. We proved that the matrix G λ,a (Y ) is the unique optimal solution for the optimization problem. The proof is complete. Lemma 2.2. For any fixed λ >, µ > and matrix Z R m n, let X s = G λµ,a (B µ (Z)), then X s is the unique global minimizer of the problem min C λ,µ(x,z), where the X R m n matrix function C λ,µ (X,Z) is defined in (.8) of section.. Proof. First, we will rewrite the formula of C λ,µ (X,Z). Note that A (X) and A (Z) are vectors in space R p. Thus in the formula of C λ,µ (X,Z), there exist norms and inner products for both matrices and vectors. By definition, Thus if we fix matrix Z, C λ,µ (X,Z) = 2 X 2 F X,Z + 2 Z 2 F +λµt (X)+ µ 2 b 2 2 µ(a (X),b A (Z)) µ 2 A (Z) 2 2 = 2 X 2 F + 2 Z 2 F + µ 2 b 2 2 µ 2 A (Z) 2 2 +λµt (X) X,Z +µa (b A (Z)) = 2 X B µ(z) 2 F +λµt (X) 2 B µ(z) 2 F + 2 Z 2 F + µ 2 b 2 2 µ 2 A (Z) 2 2 (2.5) argmin C λ,µ (X,Z) = argmin 2 X B µ(z) 2 F +λµt (X) (2.6) X R m n X R m n Then by Theorem 2.3, X s is the unique global minimizer. Theorem 2.4. For fixed parameters, λ > and < µ < A 2 2. If X is a global minimizer for problem C λ (X), then X is the unique global minimizer for problem min C λ,µ(x,x ). X R m n Proof. C λ,µ (X,X ) = µ{ 2 A (X) b 2 2 +λt (X)} + 2 { X X 2 F µ A (X) A (X ) 2 2} µ{ 2 A (X) b 2 2 +λt (X) } = µc λ (X) µc λ (X ) = C λ,µ (X,X ) The first inequality is due to the fact: A (X) A (X ) 2 2 = Avec(X) Avec(X ) 2 2 A 2 2 vec(x X ) 2 2 A 2 2 X X 2 F (2.7) (2.8) From the above inequalities, we know that X is an optimal solution for min X R m n C λ,µ (X,X ). The uniqueness of X follows from Lemma 2.2. By the above Theorems and Lemmas, if X is a global minimizer of C λ (X), it is also the unique global minimizer of C λ,µ (X,Z) with Z = X, which has the closed form solution formula. Thus we arrive at the following fixed point equation for this global minimizer X : X = G λµ,a (B µ (X )). (2.9) Suppose the SVD for matrix B µ (X ) is U Diag(σ b )V T, then X = U Diag(g λµ,a (σ b ))V T, which means that the singular values of X satisfy σi = g λµ,a(σb,i ), for i =,...,m.

9 S. Zhang, P. Yin, J. Xin 9 3. TS thresholding algorithms Next we will utilize the fixed point equation (2.9) to derive two thresholding algorithms for TS regularized problem (2.3). As in [3, 3], from the equation X = G λµ,a (B µ (X )) = UDiag(g λµ,a (σ))v T, we will replace the optimal matrix X with X k on the left and X k on the right at the k-th step of iteration as: X k = G λµ,a (B µ (X k )) = U k Diag ( g λµ,a (σ k ) ) V k,t, (3.) where unitary matrices U k, V k and singular values {σ k } come from the SVD decomposition of matrix B µ (X k ). Operator g λµ,a ( ) is defined in (2.9), and {, if w < t; g λµ,a (w) = h λµ (w), if w t. (3.2) Recall that the thresholding parameter t is: { t 2 = λµ a+ a2 a, if λ 2(a+)µ t = ; t 3 = 2λµ(a+) a 2, if λ > a2 2(a+)µ. (3.3) With an initial matrix X, we obtain an iterative algorithm, called TS iterative thresholding (IT) algorithm. It is the basic TS iterative scheme. Later, two adaptive and more efficient IT algorithms (TS-s and TS-s2) will be introduced. 3.. Semi-Adaptive Thresholding Algorithm TS-s We begin with formulating an optimal condition for regularization parameter λ, which serves as the basis for the parameter selection and updating in this semi-adaptive algorithm. Suppose the optimal solution matrix X has rank r, by prior knowledge or estimation. Here, we still assume m n. For any µ, denote B µ (X) = X +µa T (b A (X)) and {σ i } m i= are the m non-negative singular values for B µ(x). Suppose that X is the optimal solution matrix of (2.3), and the singular values of matrix B µ (X ) are denoted as σ σ 2... σ m. Then by the fixed equation (2.9), the following inequalities hold: σ i > t i {,2,...,r}, σ j t j {r +,r +2,...,m}, (3.4) where t is our threshold value. Recall that t 3 t t 2. So It follows that σ r t t 3 = 2λµ(a+) a 2 ; σ r+ t t 2 = λµ a+ a. (3.5) λ aσ r+ µ(a+) λ λ 2 (a+2σ r) 2 8(a+)µ or λ [λ,λ 2 ]. The above estimate helps to set optimal regularization parameter. A choice of λ is (I) When λ a2 2(a+)µ, set λ = λ. Then we will have λ a2 2(a+)µ and thus thresholding value t = t 2;

10 TS for Low Rank Matrix Completion (II) When λ > Then λ > a2 2(a+)µ, set λ = λ 2. a2 2(a+)µ. Thus we choose thresholding value t = t 3. In practice, we approximate B µ (X ) by B µ (X n ) in the above formula, that is, the i-th singular value σi is approximated by σn i at the n-th iteration step. Thus, we have λ,n = aσn r+ µ(a+), and λ 2,n = (a+2σn r ) 2. We choose optimal parameter λ at the n-th step 8(a+)µ as λ n = { λ,n, if λ,n a2 2(a+)µ, λ 2,n, if λ,n > a2 2(a+)µ. (3.6) This way, we obtain an adaptive iterative algorithm without pre-setting the regularization parameter λ. The TL parameter a is still free and needs to be selected beforehand. Thus the algorithm is overall semi-adaptive, called TS-s for short and summarized in Algorithm. Algorithm : TS-s thresholding algorithm Initialize: Given X and parameter µ and a. while NOT converged do. Y n = B µ (X n ) = X n µa (A (X n ) b), and compute SVD of Y n as Y n = U Diag(σ)V T ; 2. Determine the value for λ n by (3.6), then obtain the thresholding value t n by (3.3); 3. X n+ = G λnµ,a(y n ) = UDiag(g λnµ,a(σ))v T ; Then, n n+. end while 3.2. Adaptive Thresholding Algorithm TS-s2 Different from TS-s where the parameter a needs to be determined manually, here at each iterative step, a we choose a = a n such that equality λ n = 2 n 2(a n+)µ n holds. The threshold value t is given by a single formula with t = t 3 = t 2. Putting λ = at critical value, the parameter a is expressed as: a2 2(a+)µ The threshold value is: a = λµ+ (λµ) 2 +2λµ. (3.7) t = λµ a+ a = λµ 2 + (λµ)2 +2λµ. (3.8) 2 Let X be the TL optimal solution and σ be the singular values for matrix B µ (X ). Then we have the following inequalities: So, for parameter λ, we have: σ i > t i {,2,...,r}, σ j t j {r +,r +2,...,m}. (3.9) 2(σr+) 2 +2σr+ λ 2(σ r) 2 +2σr.

11 S. Zhang, P. Yin, J. Xin Once the value of λ is determined, the parameter a is given by (3.7). In the iterative method, we approximate the optimal solution X by X n and further use B µ (X n ) s singular values {σ n i } i to replace those of B µ (X ). The resulting parameter selection is: λ n = 2(σn r+) 2 ; +2σ n r+ a n = λ n µ n + (λ n µ n ) 2 +2λ n µ n. (3.) In this algorithm (TS-s2 for short), only parameter µ is fixed, satisfying inequality µ (, A 2 ). Its algorithm is summarized in Algorithm 2. Algorithm 2: TS-s2 thresholding algorithm Initialize: Given X and parameter µ. while NOT converged do. Y n = X n µa (A (X n ) b), and compute SVD of Y n as Y n = U Diag(σ)V T ; 2. Determine the values for λ n and a n by (3.), then update threshold value t n = λ n µ an + a ; n 3. X n+ = G λn µ,a n(y n ) = U Diag(g λn µ,a(σ))v T ; Then n n+. end while 4. Numerical experiments In this section, we present numerical experiments to illustrate the effectiveness of our Algorithms: semi-adaptive TS-s and adaptive TS-s2, compared with several state-of-art solvers on matrix completion problems. The comparison solvers include: LMaFit [28], FPCA [23], sirls-q [24], IRucLq-M [2], LRGeomCG [34] The code LMAFit solves a low-rank factorization model, instead of computing SVD which usually takes a big chunk of computation time. Also part of its codes is written in C, same as LRGeomCG. So once this method converges, it is the fastest method among all comparisons. All others codes are implemented under Matlab environment and involve SVD approximated by fast Monte Carlo algorithms [, ]. FPCA is a nuclear norm minimization code, while sirls-q and IRucLq-M are iterative reweighted least square algorithms for Schatten-q quasi-norm optimizations. LRGeomCG algorithm explores matrix completion based on Riemannian optimization. It tries to minimize the least-square distance on the sampling set over the Riemannian manifold of fixed-rank matrices. When the rank information is known or well approximated, this method is efficient and accurate, as shown in these experiments below, especially for standard Gaussian matrices. But a drawback of LRGeomCG is that the rank of the manifold is fixed. Basically, it is hard for it to handle unknown rank cases. In our TS algorithms, MC SVD algorithm [] is implemented at each iteration step, same as FPCA. We also tried another fast SVD approximation algorithms, but MC SVD is the most suitable one, satisfying both speed and accuracy requirements in one iterative algorithm. All our tests were performed on a Lenovo desktop: 6 GB TS matlab codes can be downloaded from

12 2 TS for Low Rank Matrix Completion of RAM and Core Quad processor i7-477 with CPU at 3.4GHz under 64-bit Ubuntu system. We tested and compared these solvers on low rank matrix completion problems under various conditions, including multivariate Gaussian, uniform and χ 2 distributions. We also tested the algorithms on grayscale image recovery from partial observations (image inpainting). 4.. Implementation details In the following series of tests, we generated random matrices M = M L M T R R m n, where matrices M L and M R are in spaces R m r and R n r respectively. By setting parameter r to be small, we obtain a low rank matrix M with rank at most r. After this step, we uniformly random-sampled a subset ω with p entries from M. The following quantities help to quantify the difficulty of a recovery problem. SR (Sampling ratio): SR = p/mn. FR (Freedom ratio): FR = r(m+n r)/p, which is the freedom of rank r matrix divided by the number of measurement. According to [23], if FR >, there are infinite number of matrices with rank r and the given entries. r m (Maximum rank with which the matrix can be recovered): r m = m+n (m+n) 2 4p (floor function), 2 which is defined as the largest rank such that FR. The TS thresholding algorithms do not guarantee a global minimum in general, similar to non-convex schemes in -dimensional compressed sensing problems. Indeed we observe that TS thresholding with random starts may get stuck at local minima especially when parameter FR (freedom ratio) is high or the matrix completion is difficult. A good initial matrix X is important for thresholding algorithms. In our numerical experiments, instead of choosing X = or random, we set X equal to matrix M whose elements are as observed on Ω and zero elsewhere. The stopping criterion is X n+ X n F max{ X n F,} tol where X n+ and X n are numerical results from two contiguous iterative steps, and tol is a moderately small number. In all these following experiments, we fix tol = 6 with maximum iteration steps. We also use the relative error rel.err = X opt M F M F (4.) to estimate the closeness of X opt to M, where X opt is the optimal solution produced by all numerical algorithms Rank estimation For thresholding algorithms, rank r is the most important parameter, especially for our TS methods, where thresholding value t is determined based on r. If the true rank r is unknown, we adopt the rank decreasing estimation method (also called maximum

13 S. Zhang, P. Yin, J. Xin 3 eigengap method) as in [2, 28], thereby extending both TS-s and TS-s2 schemes to work with an overestimated initial rank parameter K. In the following tests, unless otherwise specified, we set K =.5r. The idea behind this estimation method is as follows. Suppose that at step n, our current matrix is X. The eigenvalues of X T X are arranged with descending order and λ rmin λ rmin+... λ K+ > is the r min -th through K +-th eigenvalues of X T X, where r min is manually specified minimum rank estimate. Then we compute the quotient sequence λ i = λ i /λ i+, i = r min,...,k. Let K = argmin λ i, r min i K the corresponding index for maximal element of { λ i }. If the eigenvalue gap indicator τ = λ K(K r min +)/ i K λ i >, we adjust our rank estimator from K to K. During numerical simulations, we did this adjustment only once for each problem. In most cases, this estimation adjustment is quite satisfactory and the adjusted estimate is very close to the true rank r Choice of a: optimal parameter testing for TS-s. A major difference between TS-s and TS-s2 is the choice of parameter a, which influences the behaviour of penalty function ρ a ( ) of TS. When a tends to zero, the function T (X) approaches the rank. We tested TS-s on small size low rank matrix completion with different a values, varying among {.,.5,,,}, for both known rank scheme and the scheme with rank estimation. In these tests, M = M L MR T is a random matrix, where M L and M R are generated under i.i.d standard normal distribution. The rank r of M varies from to 22. For each value of a, we conducted 5 independent tests with different M and sample index set ω. We declared M to be recovered successfully if the relative error (4.) was less than 5 3. The test results for known rank scheme and rank estimation scheme are both shown in Figure 4.. The success rate curves of rank estimation scheme are not as clustered as those of known rank scheme. In order to clearly identify the optimal parameter a, we ignored the curve of a =. in the right figure as it is always below all others. The vertical red dotted line there indicates the position where FR =.6. It is interesting to see that for known rank scheme, parameter a = is the optimal strategy, which coincides with the optimal parameter setting in [3]. It is observed that when we use thresholding algorithm under transformed L (TL) or transformed Schatten- (TS) quasi norm, it is usually optimal to set a = with given information of sparsity or rank. However, for the scheme with rank estimation, it is more complicated. Based on our tests, if FR <.6, it is better to set a to reach good performance. On the other hand, if FR >.6, a = is nearly the optimal choice. So for all the following tests, when we apply TS-s with rank estimation, the parameter a is set to be {, if FR <.6; a =, if FR.6. In applications where FR is not available, we suggest to use a =, since its performance is also acceptable if FR <.6.

14 4 TS for Low Rank Matrix Completion Parameter test for a with rank known prior Parameter test for a with rank estimation.9.8 a=. a=.5 a= a= a=.9.8 Frequency of success Frequency of success a=.5 a= a= a= rank 6 fr = rank with fr changing from.23 to.78 Rank is known Rank is estimated Fig. 4.: Optimal parameter test for semi-adaptive method: TS-s 4.2. Completion of Random Matrices The ground truth matrix M is generated as the matrix product of two low rank matrices M L and M R. Their dimensions are m r and n r respectively, with r min(m,n). In these following experiments, except clearly stated, M L and M R are generated with multivariate normal distribution N (µ,σ), with µ = and Σ = {( cov) I (i=j) +cov} r r determined by parameter cov. Thus matrix M = M L MR T has rank at most r. It is known that success recovery is related to FR. The higher FR is, the harder it is to recover the original low rank matrix. In the first batch of tests, we varied rank r and fixed all other parameters, i.e. matrix size (m,n), sampling rate (sr). Thus FR was changing along with rank. It is observed that the performance of TS-s and TS-s2 are very different, due to adopting single or double thresholds. TS-s2 uses only one (smooth) thresholding scheme with changing parameter a. It converges faster than TS-s when the rank is known, see subsection On the other hand, TS-s utilizes two (smooth and discontinuous) thresholding schemes, and is more robust in case of overestimated rank. TS-s outperforms TS-s2 when rank estimation is used in lieu of the true rank value, see subsection IRucL-q method is found to be very robust for varied covariance and rank estimation, yet it underperforms TS methods at high FR, even with more computing time. Though TS methods rely on the same rank estimation method as IRucL-q, IRucL-q achieves the best results in the absence of true rank value. A possible reason is that in IRucL-q iterations, the singular values of matrix X are computed more accurately. In TS, singular values are computed by fast Monte Carlo method at every iteration. Due to random sampling of Monte Carlo method, there are more errors especially at the beginning stage of iteration. The resulting matrices X n may cause less accurate rank estimation Matrix completion with known rank In this subsection, we implemented all six algorithms under the condition that true rank value is given. They are TS-s, TS-s2, sirls-q, IRucL-q, LMaFit and

15 S. Zhang, P. Yin, J. Xin 5 LRGeomCG. We skipped FPCA since rank is always adaptively estimated there. Gaussian matrices with different ranks. In these tests, matrix M = M L M T R was generated under uncorrelated normal distribution with µ =. We conducted tests both on low dimensional matrices with m = n = (Table 4.) and high dimensional matrices with m = n = (Table 4.2). Tests on non-square matrices with m n show similar results. In Table 4., rank r varies from 5 to 8, while FR increases from.2437 up to.89. For lower rank (less than 5), LMaFit is the best algorithm with low relative errors and fast convergence speed. Part of the reason is that this method does not involve SVD (singular value decomposition) operations during iteration. LRGeomCG approaches the performance of LMaFit when r. However, as FR values are above.7, it became hard for LMaFit to find truth low rank matrix M. Its performance is not as good as stated in paper [34] with possible reason that we generate M with mean µ equal to, instead of in [34]. We also tested LRGeomCG with µ = where it has very small relative error and also fast convergence rate. It is also noticed that in Table 4., the two TS algorithms performed very well and remained stable for different FR values. At similar order of accuracy, the TLs are faster than IRucL-q. For large size matrices (m = n = ), rank r is varied from 5 to, see table 4.2. The sirls-q and LMaFit only worked for lower FR. IRucL-q can still produce satisfactory results with relative error around 3, but its iterations took longer time. In [2], it was carried out by high speed-performance CPU with many cores. Here we used an ordinary processor with only 4 cores and 8 threads. It is believed that with a better machine, IRucL-q will be much faster, since parallel computing is embedded in its codes. As seen in the table, LRGeomCG is always convergent and achieves almost same accuracy with TS-s and TS-s2. However, its computation time grows fast with increasing rank. A little difference between the two TS algorithms begins to emerge when the matrix size is large. Although when rank is given, they all performed better than other schemes, adaptive TS-s2 is a little faster than semi-adaptive TS-s. It is believed by choosing optimal parameter a, TS-s will be improved. The parameter a is related to matrix M, i.e. how it is generated, its inner structure, and dimension. In TS-s2, the value of parameter a does not need to be manually determined. Gaussian Matrices with Different Covariance. In this subsection, the rank r, the sampling rate, and the freedom ratio FR are fixed. We varied parameter cov to generate covariance matrices of multivariate normal distribution. In Table 4.3, we chose two rank values, r = 5 and r = 8. It is harder to recover the original matrix M when it is more coherent. IRucL-q does better in this regime. Its mean computing time and relative errors are less influenced by the changing cov. Results on large size matrices are shown in Table 4.4. TS-s2 scheme is much better than TS-s, both in relative error and computing time. In small size matrix experiments, TS-s2 is the best among comparisons. In Table 4.4, we fixed rank = 3 with cov among {.,...,.7}. TS-s2 is still satisfactory both in accuracy and speed for low covariance (i.e cov.6). However, for cov.7, relative errors increased from 6 to around 4. It is also observed that IRucL-q algorithm is very stable and robust under covariance change. Matrices from other distributions. We also compare algorithms with other distributions, including (,) uniform distribution and Chi-square distribution with k

16 6 TS for Low Rank Matrix Completion Table 4.: Comparison of TS-s, TS-s2, sirls-q, IRucL-q, LMaFit and LRGeomCG on recovery of uncorrelated multivariate Gaussian matrices at known rank, m = n =, SR =.4, with stopping criterion tol = 6. Problem TS-s TS-s2 sirls-q* rank FR rel.err time rel.err time rel.err time e e e e e e e e e e e-7.2.3e e e e e-5.33.e e e e e e e e e e e e-4. Problem IRucL-q LMaFit LRGeomCG rank FR rel.err time rel.err time rel.err time e e-6.2.3e e e e e e e e e e e e e e e e e e e e e e e e-.34.9e e e e e e e-.94 * Notes:. The sirls-q iterations did not converge when rank > 4 and FR.65. Comparison is skipped over this range. Results for rank (,2,3) are skipped as they differ little from rank 4. Similar rank samplings occur in Tables 4.(2-3), 4.(5-6). 2. Matrix M is generated from multivariate normal distribution with mean µ =, instead of. = (degree of freedom). All other parameters are same as Table 4.. The results are displayed at Table 4.5 (uniform distribution) and Table 4.6 (Chi-square distribution). Only partial numerical results are showed here with rank r = 7,8,9,,4,5. From these two tables, two TS algorithms have satisfying relative errors and stable performance, same as IRuccL-q. For these two non-gaussian distributions, it becomes harder to successfully recover low rank matrix for LMaFit and LRGeomCG, especially when rank r > Matrix completion with rank estimation We conducted numerical experiments on rank estimation schemes. The initial rank estimation is given as.5r, which is a commonly used overestimate. FPCA [23] is included for comparison, while LRGeomCG and sirls-q are excluded. FPCA is a fast and robust iterative algorithm based on nuclear norm regularization. We considered two classes of matrices: uncorrelated Gaussian matrices with changing rank; correlated Gaussian matrices with fixed rank (r = 5,). The results are

17 S. Zhang, P. Yin, J. Xin 7 Table 4.2: Numerical experiments on recovery of uncorrelated multivariate Gaussian matrices at known rank, m = n =, SR =.3. Problem TS-s TS-s2 sirls-q rank FR rel.err time rel.err time rel.err time e e e e e e e e e e e e Problem IRucL-q LMaFit LRGeomCG rank FR rel.err time rel.err time rel.err time e e e e e e e e e e e e Table 4.3: Numerical experiments on multivariate Gaussian matrices with varying covariance at known rank, m = n =, SR =.4. Problem TS-s TS-s2 sirls-q rank cor rel.err time rel.err time rel.err time e e e e e e e e e e e e e e e e e e e e e- 6.8 Problem IRucL-q LMaFit LRGeomCG rank cor rel.err time rel.err time rel.err time e e-2.7.2e e e e e-5.7.e e e e e e e-.25.7e e e e e e e-.29 shown in Table 4.7 and Table 4.8. It is interesting that under rank estimation, the semi-adaptive TS-s fared much better than TS-s2. In low rank and low covariance cases, TS-s is the best in terms of accuracy and computing time among comparisons. However, in the regime of high covariance and rank, it became harder for TS methods to perform efficient recovery. IRucL-q did the best, being both stable and robust. In the most difficult case, at rank = 5 and FR approximately equal to.7, IRucL-q can still obtain an accurate result with relative error around Image inpainting

18 8 TS for Low Rank Matrix Completion Table 4.4: Numerical experiments on multivariate Gaussian matrices with varying covariance at known rank, m = n =, SR =.4. Problem TS-s TS-s2 sirls-q rank cor rel.err time rel.err time rel.err time e e e e e e e e e e e e e e e e e e e e e- 5.3 Problem IRucL-q LMaFit LRGeomCG rank cor rel.err time rel.err time rel.err time e e e e e e e e e e e e e e e e e e e e e Table 4.5: Comparison with random matrices generated from (,) uniform distribution. Rank r is given and m = n =, SR =.4, with stopping criterion tol = 6. Problem TS-s TS-s2 sirls-q* rank FR rel.err time rel.err time rel.err time e e e e e e e e e e e e e-5.8.2e-5.58 Problem IRucL-q LMaFit LRGeomCG rank FR rel.err time rel.err time rel.err time e e e e e e e e e e e e e e-.8.24e e e-.6.7e-.76 As in [2, 28], we conducted grayscale image inpainting experiments to recover low rank images from partial observations, and compare with IRcuL-q and LMaFit algorithms. The boat image (see Figure 4.2) is used to produce ground truth as in [2] with rank equal to 4 and at resolution. Different levels of noisy disturbances

19 S. Zhang, P. Yin, J. Xin 9 Table 4.6: Comparison with random matrices generated from Chi-square distribution with k = (degree of freedom). Rank r is given and m = n =, SR =.4, with stopping criterion tol = 6. Problem TS-s TS-s2 sirls-q* rank FR rel.err time rel.err time rel.err time e e e e e e e e e e e e e e-5.73 Problem IRucL-q LMaFit LRGeomCG rank FR rel.err time rel.err time rel.err time e e-6.4.8e e e e e e e e e e e e-.5.46e e e e-.57 Table 4.7: Numerical experiments for low rank matrix completion algorithms under rank estimation. True matrices are uncorrelated multivariate Gaussian, m = n =, SR =.4. Problem TS-s TS-s2 FPCA IRucL-q LMaFit rank FR rel.err time rel.err time rel.err time rel.err time rel.err time e e e-.9.84e e e e e e e e e e e e e e e e e e e e e e e e e e e-.2 Table 4.8: Numerical experiments on low rank matrix completion algorithms under rank estimation. True matrices are multivariate Gaussian with different covariance, m = n =, and SR =.4. Problem TS-s TS-s2 FPCA IRucL-q LMaFit rank cor rel.err time rel.err time rel.err time rel.err time rel.err time e e e e e e e e e e e e e e e-2..5.e e-.4.2e e e e e-.4.2e e e e e e e e-2.

2 TS for Low Rank Matrix Completion Original image Sample image with noise TS-s2 IRucL_q LMaFit-inc LMaFit-fix Fig. 4.2: Image inpainting experiments with SR =.3,σ =.5.

For IRucL-q, we followed the setting in [2] by choosing α =.9 and λ = 2 σ. Both fixed rank ( LMaFit-fix ) and increased rank (LMaFit-inc) schemes are implemented for LMaFit.

The performance for each algorithm is measured in CPU time, PSNR (peak-signal noise ratio), and MSE (mean squared error).

Conclusion We presented the transformed Schatten- penalty (TS), and derived the closed form thresholding representation formula for global minimizers of TS regularized rank minimization problem.

20 2 TS for Low Rank Matrix Completion Original image Sample image with noise TS-s2 IRucL_q LMaFit-inc LMaFit-fix Fig. 4.2: Image inpainting experiments with SR =.3,σ =.5. are added to the original image Mo by the formula M = Mo + σ kmo kf ε, kεkf where the matrix ε is a standard Gaussian. Here we only applied scheme TS-s2. For IRucL-q, we followed the setting in [2] by choosing α =.9 and λ = 2 σ. Both fixed rank ( LMaFit-fix ) and increased rank (LMaFit-inc) schemes are implemented for LMaFit. We took fixed rank r = 4 for TSs2, LMaFit-fix and IRucL-q. Computational results are in Table 4.9 with sampling ratios varying among {.3,.4,.5} and noise strength σ in {.,.5,.,.5,.2,.25}. The performance for each algorithm is measured in CPU time, PSNR (peak-signal noise ratio), and MSE (mean squared error). Here we focus more on PSNR values and placed the top 2 in bold for each experiment. We observed that IRucL-q and TS-s2 fared about the same. Either one is better than LMaFit in most cases. 5. Conclusion We presented the transformed Schatten- penalty (TS), and derived the closed form thresholding representation formula for global minimizers of TS regularized rank minimization problem. We studied two adaptive iterative TS schemes (TS-s and TS-s2) computationally for matrix completion in comparison with several state-of-art methods, in particular IRucL-q. In case of low rank matrix recovery under known rank, TS-s2 performs the best in accuracy and computational speed. In low rank matrix recovery under rank estimation, TS-s is almost on par with IRucL-q except when both the matrix covariance and rank rise to certain level. In

21 S. Zhang, P. Yin, J. Xin 2 Table 4.9: Numerical experiments on boat image inpainting with algorithms TS, IRcuL-q and LMaFit under different sampling ratio and noise levels. Problem TS-s2 IRucL-q LMaFit-inc LMaFit-fix SR σ Time PSNR MSE Time PSNR MSE Time PSNR MSE Time PSNR MSE e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e-2 future work, we shall study rank estimation techniques to further improve on TS-s and explore other applications for TS penalty. Acknowledgments. The authors would like to thank Prof. Wotao Yin for his helpful suggestions on low rank matrix completion methods and numerical experiments. We also thank the anonymous referees for their constructive comments Appendix A. Proof of Ky Fan k-norm inequality. Proof. Since X = UDiag(σ)V T, the (j,k)-th entry of matrix X is X j,k = m σ i U j,i V k,i. i= Thus, we have tr k (X) = k j= = m i=j= X j,j = k j=i= m σ i U j,i V j,i k σ i U j,i V j,i = m i= σ i w (k) i, (.) where the weight w (k) i for the singular value σ i is defined as: w (k) i = k U j,i V j,i, i =,2,...,m. (.2) j=

22 22 TS for Low Rank Matrix Completion Notice that, w (k) i k U j,i V j,i U(:,i) 2 V (:,i) 2, (.3) j= where U(:,i) and V (:,i) are the i-th column vectors for U and V. {w (k) i }, m i= w (k) i m i=j= k j= k U j,i V j,i = k m U j,i V j,i j=i= U(j,:) 2 V (j,:) 2 k, Also for weights (.4) where U(j,:) and V (j,:) are the j-th row vectors for U and V, respectively. All the m weights are bounded by, with absolute sum at most k m. Note that σ i s are in decreasing order. By equation (.), we have, for all k =,2,...,m, tr k (X) m i= σ i w (k) i k σ i = tr k (D) = X F k. i= Next, we prove the second part of the lemma equality condition, by mathematical induction. Suppose that for a given matrix X, tr k (X) = tr k (D), k =,...,m. Here, it is convenient to define X i = σ i U i Vi T, where V i (U i ) is the i-th column vector of V (U). Then matrix X can be decomposed as the sum of r rank- matrices, X = r X i. When k =, according to tr (X) = tr (D) and the proof above, we know that w () = and w () = for i = 2,...,m. i By the definition of weights w (k) i in (.2), we have w () = U, V, =. Since U and V are both unitary matrices, we have: U, = V, = ±; U,j = U j, = V,j = V j, = for j. Then vectors U (V ) is the first standard basis vector in space R m (R n ). The matrix X = σ U V T is diagonal σ X =... m n i= For any index i, i k, suppose that U i,i = V i,i = ±; U i,j = U j,i = V i,j = V j,i = for any index j i. (.5)

23 S. Zhang, P. Yin, J. Xin 23 Then matrix X i = σ i U i Vi T, with i k, is diagonal and can be expressed as... X i = σ i (i-th row)... Under those conditions, let us consider the case with index i = k. Clearly, we have tr k (X) = tr k (D). Similarly as before, thanks to the formula (.) and inequalities (.3) and (.4), it is true that m n w (k) i = for i =,...,k; and w (k) i = for i > k. Furthermore, by definition (.2), w (k) k = k U j,k V j,k = U k,k V k,k =. This is because U j,k = V j,k = for index j < k, by the assumption (.5). Thus vectors U k and V k are also standard basis vectors with the k-th entry to be ±. Then... X k = σ k U k Vk T = σ k (k-th row) Finally, we prove that all matrices {X i } i=,,r are diagonal. So the original matrix X = r X i is equal to the diagonal matrix D. The other direction is obvious. We finish i= the proof. j=... m n REFERENCES [] T. Blumensath, Accelerated iterative hard thresholding, Signal Processing, 92(3), pp , 22. [2] J. Cai, E. Candès, and Z. Shen. A singular value thresholding algorithm for matrix completion. SIAM Journal on Optimization, 2(4): , 2. [3] E. Candès and T. Tao. The power of convex relaxation: Near-optimal matrix completion. Information Theory, IEEE Transactions on, 56(5):253 28, 2. [4] E. Candès, and B. Recht, Exact matrix completion via convex optimization, Found. Comput. Math., 9 (29), pp [5] W. Cao, J. Sun, and Z. Xu, Fast image deconvolution using closed-form thresholding formulas of L q,(q = /2,2/3) regularization, Journal of Visual Communication and Image Representation, 24(), pp. 3 4, 23. [6] Y. Chen, A. Jalali, S. Sanghavi, and C. Caramanis. Low-rank matrix recovery from errors and erasures. Information Theory, IEEE Transactions on, 59(7): , 23. [7] I. Daubechies, R. DeVore, M. Fornasier, C. Gunturk, Iteratively reweighted least squares minimization for sparse recovery, Comm. Pure Applied Math, 63(), pp. 38, 2. [8] I. Daubechies, M. Defrise, and C. De Mol. An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Communications on pure and applied mathematics, 57():43-457, 24.

S 1/2 Regularization Methods and Fixed Point Algorithms for Affine Rank Minimization Problems

S 1/2 Regularization Methods and Fixed Point Algorithms for Affine Rank Minimization Problems Dingtao Peng Naihua Xiu and Jian Yu Abstract The affine rank minimization problem is to minimize the rank of