S 1/2 Regularization Methods and Fixed Point Algorithms for Affine Rank Minimization Problems

Size: px

Start display at page:

Download "S 1/2 Regularization Methods and Fixed Point Algorithms for Affine Rank Minimization Problems"

Rosa Simmons
5 years ago
Views:

1 S 1/2 Regularization Methods and Fixed Point Algorithms for Affine Rank Minimization Problems Dingtao Peng Naihua Xiu and Jian Yu Abstract The affine rank minimization problem is to minimize the rank of a matrix under linear constraints. It has many applications in various areas such as statistics, control, system identification and machine learning. Unlike the literatures which use the nuclear norm or the general Schatten q (0 < q < 1) quasi-norm to approximate the rank of a matrix, in this paper we use the Schatten 1/2 quasi-norm approximation which is a better approximation than the nuclear norm but leads to a nonconvex, nonsmooth and non-lipschitz optimization problem. It is important that we give a globally necessary optimality condition for the S 1/2 regularization problem by virtue of the special objective function. This is very different from the local optimality conditions usually used for the general S q regularization problems. Explicitly, the globally optimality condition for the S 1/2 regularization problem is a fixed point equation associated with the singular value half thresholding operator. Naturally, we propose a fixed point iterative scheme for the problem. We also provide the convergence analysis of this iteration. By discussing the location and setting of the optimal regularization parameter as well as using an approximate singular value decomposition procedure, we get a very efficient algorithm, half norm fixed point algorithm with an approximate SVD (HFPA algorithm), for the S 1/2 regularization problem. Numerical experiments on randomly generated and real matrix completion problems are presented to demonstrate the effectiveness of the proposed algorithm. Key words. affine rank minimization problem; matrix completion problem; S 1/2 regularization problem; fixed point algorithm; singular value half thresholding operator AMS Subject Classification. 90C06, 90C26, 90C59, 65F22 1 Introduction The affine rank minimization problem, which is to minimize the rank of a matrix under linear constraints, can be described as follows min X R m n rank(x) s.t. A(X) = b, (1.1) College of Science, Guizhou University, Guiyang , Guizhou, China; and School of Science, Beijing Jiaotong University, Beijing , China; (dingtaopeng@126.com). This author was supported by the NSFC grant and the Guizhou Provincial Science and Technology Foundation grant School of Science, Beijing Jiaotong University, Beijing , China (nhxiu@bjtu.edu.cn). This author was supported by the National Basic Research Program of China grant 2010CB and the NSFC grant College of Science, Guizhou University, Guiyang , Guizhou, China (sci.jyu@gzu.edu.cn). 1

2 where b R p is a given vector and A : R m n R p is a given linear transformation determined by p matrices A 1,, A p R m n via A(X) := [ A 1, X,, A p, X ] T for all X R m n, with A i, X := trace(a T i X), i = 1,, p. An important special case of (1.1) is the matrix completion problem [6] min X R m n rank(x) s.t. X i,j = M i,j, (i, j) Ω, (1.2) where X and M are both m n matrices, Ω is a subset of index pairs (i, j), and a small subset {M i,j (i, j) Ω} of the entries is known. Many applications arising in various areas can be captured by solving the model (1.1), for instance, the low-degree statistical models for a random process [17, 36], the low-order realization of linear control systems [19, 37], low-dimensional embedding of datum in Euclidean spaces [20], system identification in engineering [28], machine learning [32], and other applications [18]. The matrix completion problem (1.2) often arises, for which the examples include the Netflix problem, global positioning, remote sensing and so on [5, 6]. Moreover, problem (1.1) is an extension of the well-known sparse signal recovery (or compressed sensing) which is formulated as finding a sparsest solution of an underdetermined system of linear equations [7, 15]. Problem (1.1) was considered by Fazel [18] in which its computational complexity is analyzed and it is proved to be an NP-hard problem. To overcome such a difficulty, Fazel [18] and other researchers (e.g., [6, 8, 34]) have suggested to relax the rank of X by the nuclear norm, that is, to consider the following nuclear norm minimization problem min X R m n or the nuclear norm regularization problem X s.t. A(X) = b, (1.3) min X R m n A(X) b λ X (1.4) if the data contain noises, where X is the nuclear norm of X, i.e., the sum of its singular values. It is well-known that problems (1.3) and (1.4) are both convex and therefore, can be easier (at least in theory) solved than (1.1). Many existing algorithms rely on nuclear norm. For examples, problem (1.3) can be reformulated as a semidefinite programming [34] and be solved by SDPT3 [41]; Lin et al [26], and Tao and Yuan [39] adopt augmented lagrangian multipliers (ALM) methods to solve robust PCA problems and its extension which contain the matrix completion problem as a special case; The SVT [4] solves (1.3) by applying a singular value thresholding operator; Toh and Yun [40] solve a general model that contains (1.3) as a special case by accelerated proximal gradient (APG); Liu, Sun and Toh [27] present a framework of proximal point algorithms in the primal, dual and primal-dual forms for solving the nuclear norm minimization with linear equality and second order cone constraints; Ma, Goldfarb and Chen [29] proposed fixed point and Bregman iterative algorithms for solving problem (1.3). Considering the nonconvexity of the original problem (1.1), some researchers [23, 25, 30, 31, 33] suggest to use the Schatten q (0 < q < 1) quasi-norm (for short, q norm) relaxation, that is, to solve the q-norm minimization problem min X R m n X q q s.t. A(X) = b, (1.5) 2

3 or the S q regularization problem min X R m n A(X) b λ X q q (1.6) if the data contain noises, where the Schatten q quasi-norm of X is defined by X q q := min{m,n} i=1 σ q i and σ i (i = 1,, min{m, n}) are the singular values of X. Problems (1.5) is intermediate between (1.1) and (1.3) in the sense that rank(x) = min{m,n} i = 1 σ i 0 σ 0 i, X q q = min{m,n} i=1 σ q i, and X = min{m,n} Obviously, the q quasi-norm is a better approximation of the rank function than the nuclear norm, but it leads to a nonconvex, nonsmooth, non-lipschitz optimization problem for which the global minimizers are difficult to find. In fact, the nonconvex relaxation method was firstly proposed in the region of sparse signal recovery [9,10]. Recently, nonconvex regularization methods associated with the l q (0 < q < 1) norm have attracted much attention and many theoretical results and algorithms have been developed to solve the nonconvex, nonsmooth, even non-lipschitz optimization problems, see, e.g., [2, 14, 23, 25, 31]. Extensive computational results have shown that using the l q norm can find very sparse solutions under very little measurements, see, e.g., [9 14, 25, 31, 36, 45]. However, since the l q norm minimization is a nonconvex, non-smooth and non-lipschitz problem, it is in general difficult to give a theoretical guarantee for finding a global solution. Moreover, which q should be selected is another interesting problem. The results in [43 45] revealed that the l 1/2 relaxation can be somehow regarded as a representation among all the l q relaxation with q in (0, 1) in the sense that the l 1/2 relaxation has more powerful recovering ability than the l q relaxation as 1/2 < q < 1, meanwhile the recovering ability has no much difference between the l 1/2 relaxation and the l q relaxation as 0 < q < 1/2. Moreover, Xu et al [44] in fact provided a globally necessary optimality condition for the l 1/2 regularization problem, which is expressed as a fixed point equation involving the half thresholding function. This condition may not hold at the local minimizers. Then they developed a fast iterative half thresholding algorithm for the l 1/2 regularization problem which matches the iterative hard thresholding algorithm for the l 0 regularization problem and the iterative soft thresholding algorithm for the l 1 regularization problem. In this paper, inspired by the works of nonconvex regularization method, especially l 1/2 regularization mentioned above, we focus our attention on the following S 1/2 regularization problem { } min AX b 2 X R m n 2 + λ X 1/2 1/2, (1.7) where X 1/2 1/2 = min{m,n} i=1 σ 1/2 i and σ i (i = 1,, min{m, n}) are the singular values of X. This paper is organized as follows. In Section 2, we briefly discuss the relation bween the global minimizers of problem (1.5) and problem (1.6). In Section 3, we deduce an analytical thresholding expression associated with the solutions to problem (1.7), and establish an exact lower bound of the nonzero singular values of the solutions. Moveover, we prove that the solutions to problem (1.7) are fixed points of a matrix-valued thresholding operator. In Section 4, based on the fixed point condition, we give a naturally iterative formula, and provide the convergence analysis of our proposed iteration. Section 5 discusses the location of the optimal regularization parameter and the setting of the parameter which coincides with 3 i=1 σ 1 i.

4 the fixed point continuation technique used in the convex optimization. Since the singular value decomposition is computationally expensive, in Section 6 we employ an approximate singular value decomposition procedure to cut the cost of time. Thus we get a very fast, robust and powerful algorithm which we call HFPA algorithm (half norm fixed point algorithm with an approximate SVD). Numerical experiments on randomly generated and real matrix completion problems are presented in Section 7 to demonstrate the effectiveness of the HFPA algorithm. At last, we conclude our results in section 8. Before continuing, we summarize the notations that will be used in this paper. Throughout this paper, without loss of generality, we always suppose m n. Let x 2 denote the Euclidean norm of any vector x R p. For any x, y R p, x, y = x T y denotes the inner product of two vectors. For any matrix X R m n, σ(x) = (σ 1 (X),, σ m (X)) T denotes the vector of singular values of X arranged in nonincreasing order, and it will be simply denoted by σ = (σ 1,, σ m ) T if no confusion is caused in the context; Diag(σ(X)) denotes a diagonal matrix whose diagonal vector is σ(x); and X F denotes the Frobenius norm of X, i.e., X F = ( ) 1/2 ( i,j X2 ij = m ) 1/2. i=1 σ2 i For any X, Y R m n, X, Y = tr(y T X) denotes the inner product of two matrices. Let the linear transformation A : R m n R p be determined by p given matrices A 1,, A p R m n, that is, A(X) = ( A 1, X,, A p, X ) T. Define A = (vec(a 1 ),, (vec(a p )) T R p mn and x = vec(x) R mn where vec( ) is the stretch operator, then we have A(X) = Ax and A(X) 2 A X F, where A := max{ A(X) 2 : X F = 1} = A 2 and A 2 is the spectral norm of the matrix A. Let A denote the adjoint of A. Then for any y R p, we have A y = p i=1 y ia i and A(X), y = X, A y = vec(x), vec(a y) = x, A T y. 2 Relation between global minimizers of problem (1.5) and problem (1.6) We now show that in some sense, problem (1.5) can be solved by solving problem (1.6). The theorem here is general and covers problem (1.7) as a special case. We note that the regularization term X q q is nonconvex, nonsmooth and non-lipschitz, hence the result is nontrivial. Theorem 2.1 For each λ > 0, the set of global minimizers of (1.6) is nonempty and bounded. Let {λ k } be a decreasing sequence of positive numbers with λ k 0, and X λk be a global minimizer of problem (1.6) with λ = λ k. Suppose that problem (1.5) is feasible, then {X λk } is bounded and any of its accumulation points is a global minimizer of problem (1.5). Proof. Since C λ (X) := A(X) b λ X q q λ X q q, the objective function C λ (X) is bounded from below and is coercive, i.e., C λ (X) if X F, and hence the set of global minimizers of (1.6) is nonempty and bounded. Suppose that problem (1.5) is feasible and X is any feasible point, then A X = b. Since is a global minimizer of problem (1.6) with λ = λ k, we have X λk max { λ k X λk q q, A(X) λk b 2 2} λk X λk q q + A(X) λk b 2 2 λ k X q q + A X b 2 2 = λ k X q q. From λ k X λk q q λ k X q q, we get X λk q q X q q, that is, the sequence {X λk } is bounded. Thus, {X λk } has at least one accumulation point. Let X be any accumulation point of 4

5 {X λk }. From A(X) λk b 2 2 λ k X q q and λ k 0, we derive A(X) = b, that is, X is a feasible point of problem (1.5). It follows from X λk q q X q q that X q q X q q. Then by the arbitrariness of X, we obtain that X is a global minimizer of problem (1.5). 3 Globally necessary optimality condition In this section, we give a globally necessary optimality condition for problem (1.7), which perhaps does not hold at the local minimizers. This condition is expressed as a matrixvalued fixed point equation associated with a special thresholding operator which we called half thresholding operator. Before we start to research the S 1/2 regularization problem, we begin with introducing the half thresholding operator. 3.1 Half thresholding operator First, we introduce the half thresholding function, which is to minimize a real-valued function. The following key lemma follows but is different from Xu et al [44]. Lemma 3.1 Let t R, λ > 0 be two given real numbers. Suppose that x R is a global minimizer of the problem min f(x) := (x x 0 t)2 + λx 1/2. (3.1) Then x is uniquely determined by (3.1) when t 4 λ2/3, and can be analytically expressed by h λ,1/2 (t), if t > x 4 λ2/3 = h λ (t) := {h λ,1/2 (t), 0}, if t = 4 λ2/3 (3.2) 0, if t < 4 λ2/3 where h λ,1/2 (t) = 2 ( 2π (1 3 t + cos 3 2 )) 3 ϕ λ(t) (3.3) with ϕ λ (t) = arccos ( λ 8 ( ) ) t 3/2. (3.4) 3 Proof. Firstly, we consider the positive stationary points of (3.1). The first order optimal condition of (3.1) gives that x t + λ 4 = 0. (3.5) x This equation implies that if and only if t > 0 it has positive roots, and that if t 0, f(x) is increasing on [0, + ) and x = 0 is the unique minimizer of (3.1). Hence, we need only to consider t > 0 from now on. By solving equation (3.5) and comparing the values of f at each root of equation (3.5), Xu et al [44] have showed that x = h λ,1/2 (t) defined by (3.3) is the unique desired positive stationary point of (3.1) such that f( x) is the smallest among the 5

6 values of f at its all positive stationary points (see (14),(15) and (16) in [44], we note that, in (16), x i > 3 4 λ2/3 is not necessary, in fact, x i > 0 is enough). The rest thing is to compare the values between f( x) and f(0). Fortunately, Xu et al (see Lemma 1 and Lemma 2 in [44]) have showed that and The other case is naturally f( x) < f(0) t > f( x) = f(0) t = 4 λ2/3 4 λ2/3. f( x) > f(0) t < 4 λ2/3. The above three relationships imply x, if t > x 4 λ2/3 = { x, 0}, if t = 4 λ2/3, 0, if t < 4 λ2/3 which completes the proof. Figure 1 shows the minimizers of the function f(x) with two different pairs of (t, λ). Specifically, in (a) t = 2, λ = 8 and in (b) t = 4, λ = 8. In (b), x = 0 is a local minimizer of f(x); Meanwhile, since t = 4 > 4 λ2/3 =, the global minimizer is x = h λ,1/2 (4) > t =2, λ= t =4, λ= f(x) f(x) x (a) x (b) Figure 1: The minimizers of the function f(x) with two different pairs of (t, λ). Lemma 3.2 (Appendix A in [44]) If t > 4 λ2/3, then the function h λ (t) is strictly increasing. Similar to [33, 44], using h λ ( ) defined in Lemma 3.1, we can define the following half thresholding function and half thresholding operators. 6

7 Definition 3.3 (Half thresholding function) Assume t R. For any λ > 0, the function h λ ( ) defined by (3.2)-(3.4) is called a half thresholding function. Definition 3.4 (Vector half thresholding operator) For any λ > 0, the vector half thresholding operator H λ ( ) is defined as H λ (x) := (h λ (x 1 ), h λ (x 2 ),, h λ (x n )) T, x R n. Definition 3.5 (Matrix half thresholding operator) Suppose Y R m n of rank r admits a singular value decomposition (SVD) as Y = UDiag(σ)V T, where U and V are respectively m r and n r matrices with orthonormal columns, and the vector σ = (σ 1, σ 2,, σ r ) T consists of positive singular values of Y arranged in nonincreasing order (Unless specified otherwise, we will always suppose the SVD of a matrix is given in this reduced form). For any λ > 0, the matrix half thresholding operator H λ ( ) : R m n R m n is defined by H λ (Y ) := UDiag(H λ (σ))v T. In what follows, we will see that the matrix half thresholding operator defined above is in fact a proximal operator associated with X 1/2 1/2, a nonconvex and non-lispschitz function. This in some sense can be regarded as an extension of the well-known proximal operator associated with convex functions [27, 35]. Lemma 3.6 The global minimizer X s of the following problem can be analytically given by min X Y X R 2 m n F + λ X 1/2 1/2 (3.6) X s = H λ (Y ). Proof. See the Appendix. 3.2 Fixed point equation for global minimizers Now, we can begin to consider our S 1/2 regularization problem (1.7): { } min AX b 2 X R m n 2 + λ X 1/2 1/2. (3.7) For any λ, µ > 0 and Z R m n, let C λ (X) := A(X) b λ X 1/2 1/2, (3.8) C λ,µ (X, Z) := µ ( C λ (X) A(X) A(Z) 2 2) + X Z 2 F, (3.9) B µ (Z) := Z + µa (b A(Z)). (3.10) Lemma 3.7 If X s R m n is a global minimizer of C λ,µ (X, Z) for any fixed λ, µ and Z, then X s can be analytically expressed by X s = H λµ (B µ (Z)). (3.11) 7

8 Proof. Note that C λ,µ (X, Z) can be reexpressed as ( ) C λ,µ (X, Z) = µ A(X) b λ X 1/2 1/2 A(X) A(Z) X Z 2 F = X 2 F + 2µ A(X), A(Z) 2µ A(X), b 2 X, Z + λµ X 1/2 1/2 + Z 2 F + µ b 2 2 µ A(Z) 2 2 = X 2 F 2 X, Z + µa (b A(Z)) + λµ X 1/2 1/2 + Z 2 F + µ b 2 2 µ A(Z) 2 2 = X 2 F 2 X, B µ (Z) + λµ X 1/2 1/2 + Z 2 F + µ b 2 2 µ A(Z) 2 2. = X B µ (Z) 2 F + λµ X 1/2 1/2 + Z 2 F + µ b 2 2 µ A(Z) 2 2 B µ (Z) 2 F. This implies that minimizing C λ,µ (X, Z) for any fixed λ, µ and Z is equivalent to solving { min X R m n X B µ (Z) 2 F + λµ X 1/2 By applying Lemma 3.6 with Y = B µ (Z), we get expression (3.11). 1/2 }. Lemma 3.8 Let λ and µ be two fixed numbers satisfying λ > 0 and 0 < µ A 2. If X is a global minimizer of C λ (X), then X is also a global minimizer of C λ,µ (X, X ), that is, C λ,µ (X, X ) C λ,µ (X, X ) for all X R m n. (3.12) Proof. Since 0 < µ A 2, we have Hence for any X R m n, X X 2 F µ AX AX C λ,µ (X, X ) = µ ( C λ (X) AX AX 2 2) + X X 2 F ( ) = µ AX b λ X 1/2 1/2 + ( X X 2 F µ AX AX 2 ) 2 ( ) µ AX b λ X 1/2 1/2 = µc λ (X) µc λ (X ) = C λ,µ (X, X ), where the last inequality is due to that X is a global minimizer of C λ (X). The proof is thus complete. By applying Lemmas 3.7 and 3.8, we can now derive the main result of this section. Theorem 3.9 Given λ > 0, 0 < µ A 2. Let X be a global minimizer of problem (1.7) and B µ (X ) = X + µa (b A(X) ) admit the following SVD Then X satisfies the following fixed point equation B µ (X ) = U Diag(σ(B µ (X )))V T. (3.13) X = H λµ (B µ (X )). (3.14) 8

9 Particularly, one can express [σ(x )] i = h λµ ([σ(b µ (X ))] i ) h λµ,1/2 ([σ(b µ (X ))] i ), if [σ(b µ (X ))] i > 4 (λµ)2/3 = {h λµ,1/2 ([σ(b µ (X ))] i ), 0}, if [σ(b µ (X ))] i = 4 (λµ)2/3 0, if [σ(b µ (X ))] i < 4 (λµ)2/3. Moreover, we have either [σ(x )] i (3.15) 6 (λµ)2/3 or [σ(x )] i = 0. (3.16) Proof. Since X is a global minimizer of C λ (X), by Lemma 3.8, X is also a global minimizer of C λ,µ (X, X ). Consequently by Lemma 3.7, X satisfies equation (3.14). (3.15) is a reexpression of equation (3.14) in the form of component. According to (3.2)-(3.4), by direct computation, we have π lim ϕ λµ (t) = t 4 4 (λµ)2/3 and lim t 4 (λµ)2/3 h λµ (t) = 6 (λµ)2/3. This limitation together with the strict monotonicity of h λµ on t > 4 (λµ)2/3 (Lemma 3.2) implies that [σ(x )] i > 6 (λµ)2/3 as [σ(b µ (X ))] i > 4 (λµ)2/3. The last one of (3.15) shows [σ(x )] i = 0 as [σ(b µ (X ))] i < 4 (λµ)2/3. Thus, (3.16) is derived. Theorem 3.10 provides not only the lower bound estimation, say 6 (λµ)2/3, for the nonzero singular values of the global minimizers of the S 1/2 regularization problem, but also a global necessary optimality condition in the form of a fixed point equation associated with the matrix half thresholding operator H λµ ( ). In one hand, it is analogous to the fixed point condition of the nuclear norm regularization solution associated with the so-called singular value shrinkage operator (see, e.g., [4,29]). On the other hand, the half thresholding operator associated here is more complicated than the singular value shrinkage operator due to our nonconvex, nonsmooth and non-lipschitz minimization problem. Definition 3.10 We call X a global stationary point of problem (1.7) if there exists 0 < µ A 2 such that X satisfies the fixed point equation (3.14). 4 Fixed point iteration and its convergence According to the fixed point equation (3.14), a fixed point iterative formula for the S 1/2 regularization problem (1.7) can be naturally proposed as follows: given X 0, X k+1 = H λµ (X k + µa (b A(X) k )). (4.1) To simplify the process of iterations and for the aim to find low rank solutions, we make a slightly adjustment of h λµ in (4.1) as follows { h h λµ (t) := λµ,1/2 (t), if t > 4 (λµ)2/3 (4.2) 0, otherwise. The adjustment here is to choose h λµ (t) = 0 when t = 4 (λµ)2/3. Next, let us analyze the convergence of the above fixed point iteration. 9

10 Theorem 4.1 Given λ > 0, choose 0 < µ < A 2. Let {X k } be the sequence generated by iteration (4.1). Then (i) {C λ (X k )} is strictly monotonically decreasing and converges to C λ (X ) where X is any accumulation point of {X k }. (ii) {X k } is asymptotically regular, that is, lim k X k+1 X k F = 0. (iii) Any accumulation point of {X k } is a global stationary point of problem (1.7). Proof. (i) Let C λ (X), C λ,µ (X, Z) and B µ (Z) be defined by (3.8)-(3.10), and B µ (Z) admit the SVD as B µ (Z) = UDiag(σ)V T where U R m r, V R n r and σ R r ++. From Lemma 3.7, we have and therefore, C λ,µ (H λµ (B µ (Z)), Z) = min C λ,µ(x, Z), X C λ,µ (X k+1, X k ) = min X C λ,µ(x, X k ), (4.3) where X k+1 = H λµ (B µ (X k )) = U k Diag(H λµ (σ k ))Vk T B µ (X k ). Since 0 < µ < A 2, we have and U k Diag(σ k )V T k is the SVD of Hence, A(X) k+1 A(X) k µ X k+1 X k 2 F < 0. C λ (X k+1 ) = 1 ( ) C λ,µ (X k+1, X k ) X k+1 X k 2 F + A(X) k+1 A(X) k 2 2 µ 1 ( ) C λ,µ (X k, X k ) X k+1 X k 2 F + A(X) k+1 A(X) k 2 2 µ = 1 ( µ C λ,µ(x k, X k ) + A(X) k+1 A(X) k ) µ X k+1 X k 2 F < 1 µ C λ,µ(x k, X k ) = C λ (X k ), which shows that {C λ (X k )} is strictly monotonically decreasing. Since {C λ (X k )} is bounded from below, {C λ (X k )} converges to a constant C. From {X k } {X : C λ (X) C λ (X 0 )} which is bounded, it follows that {X k } is bounded, and therefore {X k } has at least one accumulation point. Let X be an accumulation point of {X k }. By the continuity of C λ (X) and the convergence of {C λ (X k )}, we get C λ (X k ) C = C λ (X ) as k +. (ii) Since 0 < µ < A 2, we have 0 < δ := 1 µ A 2 < 1 and X k+1 X k 2 F 1 δ From (3.8), (3.9) and (4.3), we derive ( Xk+1 X k 2 F µ A(X) k+1 A(X) k 2 ) 2. µ[c λ (X k ) C λ (X k+1 )] = C λ,µ (X k, X k ) µc λ (X k+1 ) C λ,µ (X k+1, X k ) µc λ (X k+1 ) = X k+1 X k 2 F µ A(X) k+1 A(X) k

11 The above two inequalities yield that, for any positive integer K, Hence, K X k+1 X k 2 F 1 δ k=0 µ δ K ( Xk+1 X k 2 F µ A(X) k+1 A(X) k 2 2) k=0 K (C λ (X k ) C λ (X k+1 )) k=0 = µ δ (C λ(x 0 ) C λ (X K+1 )) µ δ C λ(x 0 ). X k+1 X k 2 F < +, and so X k+1 X k F 0 as k +. Thus, {X k } is k=0 asymptotically regular. (iii) Let {X kj } be a convergent subsequence of {X k } and X be its limit point, i.e., From the above limitation, we derive X kj X, as k j +. (4.4) B µ (X kj ) = X kj + µa (b A(X kj )) X + µa (b A(X )) = B µ (X ), as k j +, i.e., U kj Diag(σ kj )V T k j U Diag(σ )V T, as k j +, (4.5) where B µ (X kj ) = U kj Diag(σ kj )V T k j and B µ (X ) = U Diag(σ )V T are the SVDs of B µ (X kj ) and B µ (X ) respectively. According to (4.5) and [22, Corollary 7.3.8], we have [σ kj ] i [σ ] i for each i = 1, r, as k j +, (4.6) where r is the rank of B µ (X ). By the selection principle (see, e.g., [22, Lemma 2.1.8]), we can suppose that U kj Ū, Diag(σ k j ) Diag(σ ), V kj V, as k j +, (4.7) for some Ū Rm r and V R n r both with orthonormal columns. From (4.7), we get U kj Diag(σ kj )Vk T j ŪDiag(σ ) V T. This together with (4.5) implies ŪDiag(σ ) V T = U Diag(σ )V T = B µ (X ). (4.8) The limitation (4.4) and the asymptotical regularity of {X k } imply X kj +1 X F X kj +1 X kj F + X kj X F 0, as k j +, which verifies that {X kj +1} also converges to X. Note that X kj +1 = U kj Diag(H λµ (σ kj ))V T k j, which together with X kj +1 X yields If there holds U kj Diag(H λµ (σ kj ))V T k j X, as k j +. (4.9) h λµ ([σ kj ] i ) h λµ ([σ ] i ) for each i = 1, 2, r, as k j +, (4.10) 11

12 then from (4.7), (4.10) and (4.8), we get U kj Diag(H λµ (σ kj ))V T k j ŪDiag(H λµ(σ )) V T = H λµ (B µ (X )) as k j +, where the last equality is due to the well-definedness 1 of H λµ ( ). The above limitation as well as (4.9) gives X = H λµ (B µ (X )), that is, X is a global stationary point of problem (1.7). The rest thing is to prove (4.10) to be true. For i = 1,, r, if [σ ] i < 4 (λµ)2/3, then by (4.6), [σ kj ] i < 4 (λµ)2/3 when k j is sufficiently large. This inequality as well as the definition of h λµ in (4.2) gives h λµ ([σ kj ] i ) = 0 h λµ ([σ ] i ) = 0, as k j +. If [σ ] i > 4 (λµ)2/3, then by (4.6), [σ kj ] i > 4 (λµ)2/3 when k j is sufficiently large. Note ( that although h λµ ( ) defined by (4.2) is not continuous on [0, + ), it is continuous in 3 ) 4 (λµ)2/3, +. So it follows from [σ kj ] i [σ ] i that h λµ ([σ kj ] i ) h λµ ([σ ] i ), as k j +. If [σ ] i = 4 (λµ)2/3, since [σ kj ] i [σ ] i, there are two possible cases: Case 1: There is a subsequence of {[σ kj ] i }, say {[σ kjm ] i }, converging to [σ ] i such that [σ kjm ] i [σ ] i for each k jm. In this case, we have h λµ ([σ kjm ] i ) = 0 h λµ ([σ ] i ) = 0, as k jm +. Case 2: There is a subsequence of {[σ kj ] i }, say {[σ kj n ] i}, converging to [σ ] i such that [σ kj n ] i > [σ ] i = 4 (λµ)2/3 for each k jn. However, we will verify this case can never happen as long as µ is chosen appropriately. If Case 2 happens,there is a large integer N 1 such that [σ kjn ] i ( 3 4 (λµ)2/3, 3 (λµ)2/3) holds for any k jn N 1. By (ii), X kj n +1 X kj n F 0 as k jn +. Then there is a large integer N 2 N 1 such that [σ kj n +1 ] i ( 3 4 (λµ)2/3, 3 (λµ)2/3) (4.11) 1 The matrix half thresholding operator H λµ : R m n R m n here is in fact a non-symmetric Löwner s operator [38] associated with the half thresholding function h λµ : R R. The non-symmetric Löwner s operator H λµ : R m n R m n is called well-defined if it is independent of the choice of the matrices U and V in the SVD. In other words, if Y R m n has two different SVDs such as Y = U 1Diag(σ)V T 1 = U 2Diag(σ)V T we have H λµ (Y ) = U 1 Diag(h λµ (σ 1 ),, h λµ (σ m ))V1 T = U 2 Diag(h λµ (σ 1 ),, h λµ (σ m ))V2 T. Theorem 1 of Lecture III in [38] proves that a non-symmetric Löwner s operator H : R m n R m n associated with a scalar valued function h : R + R + is well-defined if and only if h(0) = 0. By this theorem, our matrix half thresholding operator H λµ is well-defined since h λµ (0) = ,

13 holds for any k jn N 2. On the other hand, since B µ (X kjn ) = X kjn + µa (b A(X kjn )) is continuous in µ and B µ (X kjn ) X kjn if µ 0, we know that if µ is chosen sufficiently small, [σ(b µ (X kjn ))] i will be closed to [σ kjn ] i. Let µ be chosen such that [σ(b µ (X kj n ))] i ( 3 4 (λµ)2/3, 3 (λµ)2/3) holds for any k jn N 2. According to (3.2)-(3.4), by direct computation, we know π lim ϕ λµ (t) = t 4 4 (λµ)2/3 and lim t 4 (λµ)2/3 h λµ (t) = 6 (λµ)2/3. Note that [σ kj n +1 ] i = h λµ ([σ(b µ (X kj n ))] i) and h λµ ( ) is increasing in ( 4 (λµ)2/3, + ) (Lemma 3.2), then there is a large integer N 3 N 2 such that [σ kjn +1] i = h λµ ([σ(b µ (X kjn ))] i ) ( 3 6 (λµ)2/3, 4 (λµ)2/3) (4.12) holds for any k jn N 3. One can find that (4.12) is in contradiction with (4.11). This contradiction shows that Case 2 will never happen as long as µ is chosen appropriately. Therefore, we have shown (4.10) is true. The proof is thus complete. 5 Setting of parameters and fixed point contiuation In this section, we discuss the problem of parameter selection in our algorithm. As we all know, the quality of solutions to optimization problems depends seriously on the setting of regularization parameter λ. But the selection of proper parameters is a very hard problem. There is no optimal rule in general. Nevertheless, when some prior information (e.g., low rank) is known for a problem, it is realistic to set the regularization parameter more reasonably. 5.1 Location of the optimal regularization parameter We begin with finding the location of the optimal λ, which then serves as the basis of the parameter setting strategy used in the algorithm to be proposed. Specifically, suppose that a problem can be formulated as an S 1/2 regularization form (1.7), whose solutions are the matrices of rank r. Thus, we are required to solve the S 1/2 regularization problem restricted to the subregion r = {X Rm n rank(x) = r}. For any µ, denote by B µ (X) = X + µa (b A(X)). Assume X is a solution to the S 1/2 regularization problem and σ(b µ (X )) is arranged in nonincreasing order. By Theorem 3.9 (particularly (3.16)) and (4.2), we have [σ(b µ (X ))] i > 4 (λ µ) 2/3 [σ(x )] i > 6 (λ µ) 2/3 i {1, 2,, r} and [σ(b µ (X ))] i 4 (λ µ) 2/3 [σ(x )] i = 0 i {r + 1, r + 2,, n}, 13

14 which implies 96 9µ ([σ(b µ(x ))] r+1 ) 3/2 96 λ < 9µ ([σ(b µ(x ))] r ) 3/2. The above estimation provides an exact location of where the optimal parameter should be. We can then take 96 λ = ((1 α) ([σ(b µ (X ))] r+1 ) 3/2 + α ([σ(b µ (X ))] r ) 3/2) 9µ with any α [0, 1). Especially, a most reliable choice of λ is λ = 96 9µ ([σ(b µ(x ))] r+1 ) 3/2. (5.1) Of course, it may not be the best choice since we should note that the larger λ, the larger threshold value 4 (λ µ) 2/3, and the lower rank of the solution resulted by the thresholding algorithm. We also note that formula (5.1) is valid for any fixed µ. We will use it with a fixed µ 0 satisfying 0 < µ 0 < A 2 below. In applications, we may use X k instead of the real solution X and the rank of X k instead of r + 1, that is, we can take 96 λ k+1 = ([σ(x k )] rk ) 3/2, (5.2) 9µ 0 where r k is the rank of X k. More often, we can also take { { }} 96 λ k+1 = max λ, min ηλ k, ([σ(x k )] rk ) 3/2, (5.3) 9µ 0 where λ is a sufficiently small but positive real number, and η (0, 1) is a constant and r k is the rank of X k. In this case, {λ k } can keep monotonically decreasing. In next subsection, one will see that (5.3) may result an acceleration of the iteration. 5.2 Interpretation as a method of fixed point continuation In this subsection, we recast (5.3) as a continuation technique (i.e., homotopy approach) which accelerates the convergence of the fixed point iteration. In [21], Hale et al. describe a continuation technique to accelerate the convergence of the fixed point iteration for the l 1 regularization problem. Inspired by this work, Ma et al. [29] provide a similar continuation technique to accelerate the convergence of the fixed point iteration for the nuclear norm regularization problem. As shown in [21, 29], this continuation technique improves considerably the convergence speed of fixed point iterations. The main idea in their continuation technique, explained in our context, is to choose a decreasing sequence {λ k } : λ 1 > λ 2 > > λ L = λ > 0, then in the kth iteration, use λ = λ k. Therefore, formula (5.3) coincides with this continuation technique. Generally speaking, our algorithm can be regarded as a fixed point continuation algorithm, but is implemented to a nonsmooth, nonconvex and non-lipschitz optimization problem. Thus, a fixed point iterative algorithm based on the half norm of matrices for problem (1.7) can be specified as follows. 14

15 Algorithm 5.2. Half Norm Fixed Point algorithm (HFP algorithm) Given the linear operator A : R m n R p and the vector b R p ; Set the parameters µ 0 > 0, λ > 0 and η (0, 1). - Initialize: Choose the initial values {X 0, λ 1 } with λ 1 λ, set X = X 0 and λ = λ 1. - for k = 1 : maxiter, do λ = λ k, -while NOT converged, do Compute B = X + µ 0 A (b A(X)), and its SVD, say B = UDiag(σ)V T Compute X = UDiag(H λµ0 (σ))v T - end while, and output: { X k, σ k, r k =rank(x k ); - set λ k+1 = max { λ, min ηλ k, 96 9µ 0 ([σ(x k )] rk ) 3/2}} ; - if λ k+1 = λ, return; - end for In Algorithm 5.2, the positive integer maxiter is large enough that the convergence in outer loop can be ensured. 5.3 Stopping criteria for inner loops Note that in the half norm fixed point algorithm, in the kth inner loop we solve problem (1.7) for a fixed λ = λ k. We must determine when to stop this inner iteration and go to the next inner iteration. Since when X k gets close to an optimal solution X, the distance between X k and X k+1 should become very small. Hence, we can use the following criterion X k+1 X k F < xtol, (5.4) max{1, X k F } where xtol is a small positive number. Besides the above stopping criterion, we use I m to control the maximum number of the inner loops. i.e., if the stopping rule (5.4) is not satisfied after I m iterations, we terminate the subproblem and update λ to start the next subproblem. 6 HFPA algorithm: HFP algorithm with an approximate SVD In Algorithm 5.2, computing singular value decompositions is the main computational cost. Inspired by the works of Cai et al. [4] and Ma et al. [29], instead of computing the full SVD of the matrix B in each iteration, we implement a variant of HFP algorithm in which we compute only a rank-r approximation to B, where r is an estimator of the rank of the optimal solution. We call this half norm fixed point algorithm with an approximate SVD HFPA algorithm. This approach greatly reduces the computational effort required by the algorithm. Specifically, we compute an approximate SVD by a fast Monte Carlo algorithm: the Linear Time SVD algorithm developed by Drineas et al. [16]. For a given matrix A R m n, and parameters c s, k s Z + with 1 k s c s n, and {p i } n i=1, p i 0, n i=1 p i = 1, this algorithm returns an approximation to the largest k s singular values and corresponding left singular vectors of the matrix A in linear O(m + n) time. The Linear Time Approximate SVD Algorithm is outlined below. 15

16 Linear Time Approximate SVD Algorithm [16, 29] - Input: A R m n,c s, k s Z + s.t. 1 k s c s n, {p i } n i=1 s.t. p i 0, n i=1 p i = 1. - Output: H ks R m ks and σ t (C), t = 1, 2,, k s. - For t = 1 to c s, Pick i t {1, 2,, n} with Prob{i t = α} = p α, α = 1, 2,, n. Set C (t) = A (it) / c t p it. - Compute C T C and its SVD, say C T C = c s t=1 σ2 t (C)y t y t T. - Compute h t = Cy t /σ t (C) for t = 1, 2,, k s. - Return H ks, where H (t) k s = h t, and σ t (C), t = 1, 2,, k s. The outputs σ t (C) (t = 1, 2,, k s ) are approximations to the largest k s singular values and H (t) k s (t = 1, 2,, k) are approximations to the corresponding left singular vectors of the matrix A. Thus, the SVD of A is approximated by A A ks := H ks Diag(σ(C))(A T H ks Diag(1/σ(C) T ). Drineas et al. [16] prove that with high probability, A ks is an approximation to the best rank-k s approximation to A. [( In our numerical experiments, same as in [29], we set c s = 2r m 2, where r m = m + n ) ] (m + n) 2 4p /2 is, for a given number of entries sampled, the largest rank of m n matrices for which the matrix completion problem has a unique solution. We refer ro [29] for how to set k s. We also set all p i equal to 1/n. For more details for the choices of the parameters in the Linear Time Approximate SVD Algorithm, please see [16, 29]. The Linear Time Approximate SVD Code we will use is written by Shiqian Ma and is available at sm2756/fpca.htm. 7 Numerical experiments In this section, we report some numerical results on a series of matrix completion problems of the form (1.2) to demonstrate the performance of the HFPA algorithm. The purpose of numerical experiments is to assess the effectiveness, accuracy, robustness and convergence of the algorithm. The effectiveness is measured by how few measurements required to exactly recover a low-rank matrix. The fewer the measurements used by an algorithm, the better the algorithm. Under the same measurements, the shorter time used by an algorithm and the higher accuracy achieved by it, the better the algorithm. We will also test the robustness of the algorithm with respect to the varying dimensions, the varying ranks and the varying sampling ratios. To compare performance of finding low-rank matrix solutions, some other competitive algorithms such as the singular value thresholding algorithm (SVT 2 ) [4], the fixed point contiuation algorithm based on an approximate SVD using the iterative Lanczos algorithm (FPC 3 ) [29], the fixed point contiuation algorithm based on a linear time approximate SVD (FPCA 4 ) [29] have been also demonstrated together with our HFPA algorithm. Note that the former three algorithms are all based on the nuclear norm minimization, while these 2 The SVT code is available at which is written by Emmanuel Candès, October 2008, and last modified by Farshad Harirchi and Stephen Becker, April The FPC code is available at which is coded by Stephen Becker, March He referred to [29]. 4 The FPCA code is available at sm2756/fpca.htm, which is coded and modified by Shiqian Ma, July 2008 and April 2009, respectively. 16

17 four algorithms all depend on the approximate SVD. We also note that some manifold-based algorithms without SVD, such as GenRTR [1], RTRMC [3], OptSpace [24] and LMaFit [42], have good performances. Because of space constraints, we will not compare to them. All computational experiments were performed in MATLAB R2009a on a Dell desktop computer with an Intel(R) Core (TM) i GHZ CPU and 3.23GB of RAM. In our simulations, we will use the same way as used in relevant researches (for instance, [4, 6, 29]) to generate m n matrices of rank r. The procedure is that: we first generate random matrices M L R m r and M R R n r with i.i.d. Gaussian entries, and then set M = M L MR T. We then sampled a subset Ω of p entries uniformly at random. Thus, the entries of M on Ω are observed data and M is the real unknown matrix. For each problem with m n matrix M, measurement number p and rank r, we solved a fixed number of randomly created matrix completion problems. We use SR:=p/(mn), i.e., the number of measurements divided by the number of entries of the matrix, to denote the sampling ratio. Recall that an m n matrix of rank r depends upon df:=r(m + n r) degrees of freedom. Then OS:=p/df is the oversampling ratio, i.e., the ratio between the number of sampled entries and the true dimensionality of an m n matrix of rank r. Note that if OS< 1, then there is always an infinite number of matrices with rank r with the given entries, so we cannot hope to recover the matrix in this situation. We also note that when OS 1, the closer to 1 the OS is, the more difficult to recover the matrix. For this reason, following [29], we call a matrix completion problem a easy problem if OS and SR in this problem are such that OS SR>0.5 and OS > 2.6, equivalently a hard problem if OS SR 0.5 or OS 2.6. In the tables, FR := 1/OS = df/p is an often used quantity in literatures. We use rank to denote the average rank of matrices that are recovered by an algorithm. We use time and iter to denote the average time (seconds) and the average number of iterations respectively that an algorithm takes to reach convergence. We use three relative errors: the relative error on Ω, the relative recovery error in the Frobinus norm and the relative recovery error in the spectral norm rel.err(ω) := M(Ω) X opt(ω) F M(Ω) F, rel.err.f := M X opt F M F, rel.err.s := M X opt 2 M 2 to evaluate the closeness of X opt to M, where X opt is the optimal solution to (1.2) obtained by an algorithm. The parameters and initial values in HFPA algorithm for matrix completion problems are listed in Table 1. Table 1: Parameters and initial values in HFPA algorithm λ = 1e 4, η = 1/4, λ 1 = min{3, mn/p} A (b) 2, X 0 = A (b), maxiter = 10, 000, if {hard problem & max(m, n) < 1000} µ 0 = 1.7; I m = 200; else if {SR < 0.5 & min(m, n) 1000} µ 0 = 1.98; else µ 0 = 1.5; end; I m = 10; end 7.1 Results for randomly created noiseless matrix completion problems Our first experiment is to compare the recovering ability of HFPA to SVT, FPC and FPCA for small and easy matrix completion problems. Here a small matrix means that the dimension 17

18 of the matrix is less than 200, Specifically in the first experiment, we take m = n = 100, OS= 3, FR=0.33 and let the real rank increased from 6 to 16 per 1 increase. The tolerance in the four algorithms is set to be For each scale of these problems, we solve 10 randomly created matrix completion problems. The computational results for this experiment are displayed in Table 2. From Table 2, the first observation is that only HFPA recovers all the real ranks. When r < 11, the recovered ranks by SVT are larger than the real ranks; the recovered ranks by FPC are also larger than the real ones when r < 10; the same thing happens to FPCA as the real rank equal to 16. The second observation is that HFPA runs fastest among the four algorithms for most of the problems. As the real ranks change from 6 to 16, the time cost by HFPA is almost no change. Although when r 6, FPCA is slightly faster than HFPA in several percent seconds, HFPA is much faster than FPCA when r 12. Obviously, HFPA runs faster than SVT and FPC. At last, let us make a comparison among the accuracies achieved by the four algorithms. We can observe that HFPA achieves the most accurate solutions for most of the problems; even when r 12, at least one of the three relative errors by HFPA achieves 10 6 ; meanwhile the accuracies of SVT and FPC are not very good when r 7, and FPCA begins to yield very inaccurate solutions when r 13. We can draw a conclusion that for the small and easy matrix completion problems, HFPA is very fast, effective and robust. Our second experiment is to compare the recovering abilities of HFPA to SVT, FPC and FPCA for small but hard matrix completion problems. These problems are hard and challenging to recover because the oversampling ratio OS=2 is very close to 1, which implies that the observed data are very limited with respect to the freedom degree of the unknown matrices. In this experiment, we take m = n = 100, OS=2, FR=0.50 and let r increased from 2 to 24 per 2 increases. For this set of problems, SR ranges from 7.9% to 84.5%. The tolerance in this experiment is set to be For each scale of these problems, we also solve 10 randomly created matrix completion problems. The results are displayed in Table 3. From Table 3, we find that SVT and FPC cannot work well in the sense that the recovered ranks by them are far more than the real ranks and the accuracies of their solutions are poor until the real rank increases to 20. It is clear that FPCA and HFPA both work very well. We can observe that as r increases from 2 to 24, the time cost by HFPA and FPCA are both increasing but in slow speed. As we can see, HFPA shares the accuracy as good as or slightly better than FPCA, however the former is obviously faster than the later. Now we begin to test our algorithm for large randomly created matrix completion problems. We only run 5 times for each large scale problems. The numerical results of HFPA for easy and large matrix completion problems are presented in Table 4. For easy problems, since SVT performs in general better than FPC, we omit the results of FPC in Table 4 for the sake of limited spaces. For example, when m = n = 1000, r = 10, OS=6, SR=0.119 and FR=0.17, FPC costs more than 350 seconds to recover the matrix and SVT only costs about 8 seconds while they achieve the similar accuracy. From Table 4, we can see that for a unknown matrix of rank 200 with 38.7% sampling ratio, HFPA can well recover it in only 12 minutes, while SVT needs half an hour and FPCA fails to work. We also find that for a fixed scale unknown matrix, the decrease of sampling ratio has little influence on the computational time of HFPA, however the increase of sampling ratio can remarkably improve its accuracy. We can conclude that for these easy problems some of which have a very low rank and some of which have a low but not very low rank, HFPA is always powerful enough to recover them. For hard problems, without any exception SVT and FPC either diverge or cannot termi- 18

19 Table 2: Comparison of SVT, FPC, FPCA and HFPA for randomly created small and easy matrix completion problems (m = n = 100, r = 6 : 1 : 16, OS=3, FR=0.33, xtol=10 4 ) r SR Solver rank iter time rel.err(ω) rel.err.f rel.err.s SVT e e e FPC e e e-2 HFPA e e e-4 FPCA e e e-4 SVT e e e FPC e e e-2 HFPA e e e-4 FPCA e e e-4 SVT e e e FPC e e e-4 HFPA e e e-4 FPCA e e e-5 SVT e e e FPC e e e-2 HFPA e e e-5 FPCA e e e-5 SVT e e e FPC e e e-4 HFPA e e e-5 FPCA e e e-5 SVT e e e FPC e e e-4 HFPA e e e-5 FPCA e e e-5 SVT e e e FPC e e e-4 HFPA e e e-5 FPCA e e e-5 SVT e e e FPC e e e-4 HFPA e e e-5 FPCA e e e-2 SVT e e e FPC e e e-4 HFPA e e e-6 FPCA e e e-2 SVT e e e FPC e e e-4 HFPA e e e-6 FPCA e e e-1 SVT e e e FPC e e e-5 HFPA e e e-6 FPCA e e e-1 19

20 Table 3: Comparison of SVT, FPC, FPCA and HFPA for randomly created small but hard matrix completion problems (m = n = 100, r = 2 : 2 : 24, OS=2, FR=0.50, xtol=10 6 ). r SR Solver rank iter time rel.err(ω) rel.err.f rel.err.s SVT Divergent FPC e e e-1 HFPA e e e-2 FPCA e e e-2 SVT e e e FPC e e e-1 HFPA e e e-6 FPCA e e e-5 SVT e e e FPC e e e-1 HFPA e e e-6 FPCA e e e-6 SVT e e e FPC e e e-1 HFPA e e e-6 FPCA e e e-6 SVT e e e FPC e e e-2 HFPA e e e-6 FPCA e e e-6 SVT e e e FPC e e e-2 HFPA e e e-6 FPCA e e e-6 SVT e e e FPC e e e-2 HFPA e e e-6 FPCA e e e-6 SVT e e e FPC e-4 3.e e-4 HFPA e e e-7 FPCA e e e-6 SVT e e e FPC e e e-4 HFPA e e e-7 FPCA e e e-6 SVT e e e FPC e e e-4 HFPA e e e-7 FPCA e e e-7 SVT e e e FPC e e e-4 HFPA e e e-7 FPCA e e e-7 SVT e e e FPC e e e-4 HFPA e e e-7 FPCA e e e-7 20

21 Table 4: Comparison of SVT, FPCA and HFPA for randomly created large and easy matrix completion problems (xtol=10 4 ). Problems SVT HFPA FPCA m=n r OS SR FR time rel.err.f time rel.err.f time rel.err.f e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e-5 Out of memory! Table 5: Comparison of FPCA and HFPA for randomly created large and hard matrix completion problems (xtol=10 4 ). Problems HFPA FPCA m=n r OS SR FR time rel.err(ω) rel.err.f time rel.err(ω) rel.err.f e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e-4 21

22 nate in one hour, or yield very inaccurate solutions. For example, when m = n = 200, r = 10 and SR=0.195 which is the simplest case, SVT costs more than 300 seconds to recover a matrix of rank 43 with the relative error in Frobinus norm of 10 1, while FPC recovers a matrix of rank 69 with relative error in Frobinus norm of Another simple example is that when m = n = 200, r = 20 and SR=0.380, SVT costs more than 700 seconds to recover a matrix of rank 87 with relative error in Frobinus norm of 10 1, while FPC recovers a matrix of rank 96 with relative error in Frobinus norm of Therefore, in this case, only FPCA is comparable to HFPA. The results are displayed in Table 5. From Table 5, we can see that HFPA still has a powerful recovering ability for hard and large matrix completion problems. 7.2 Results for matrix completion from noisy sampled entries In this subsection, we simply demonstrate the results of HFPA for matrix completion problems from noisy sampled entries. Suppose we observe data from the following model B ij = M ij + Z ij, (i, j) Ω, (7.1) where Z is a zero-mean Gaussian white noise with standard deviation σ. The results of HFPA together with SVT and FPCA are displayed in Table 6. The quantities are averages of 5 runs. The tolerance is set to be From Table 6, we see that for the noisy sampled data HFPA performs as well as or slightly better than FPCA, while it is obviously more powerful than SVT. Table 6: Numerical results for SVT, HFPA and FPCA on random matrix completion problems with noisy noise Problems SVT HFPA FPCA level σ m=n r OS SR time rel.err.f time rel.err.f time rel.err.f e e e e e e e e e e e e e *1.66e e e **1.23e e e e e-3 * The recovered rank by SVT is 125. ** The recovered rank by SVT is The SVT algorithm can not terminate in one hour. 7.3 Results for real problems In this subsection, we apply HFPA to image inpainting problems in order to test its effectiveness in real data matrices. It is well known that grayscale images and color images can be expressed by matrices and tensors, respectively. In grayscale image inpainting, the grayscale value of some of the pixels of the image are missing, and we want to fill in these missing values. If the image is of low-rank, or of numerical low-rank, we can solve the image inpainting problem as a matrix completion problem (1.2) (see, e.g., [29]). Here, Figure 2(a) is a grayscale image of rank 600. We applied SVD to Figure 2(a) and truncated 22

100 200 300 400 500 600 700 800 900 100 200 300 400 (a) 100 200 300 400 500 600 700 800 900 100 200 300 400 (c) 100 200 300 400 200 300 500 400 600 700 800 900 500 600 700 800 900 500 600 700 800 900

23 (a) (c) (d) (e) (b) (f) (g) (h) Figure 2: (a):original image with full rank; (b): Image of rank 80 truncated from (a); (c): 50% randomly masked from (a); (d): Recovered image from (c) (rel.err.f = 8.30e-2); (e): 50% randomly masked from (b); (f): Recovered image from (e) (rel.err.f = 6.56e-2); (g): Deterministically masked from (b); (h): Recovered image from (g) (rel.err.f= 6.97e-2). 23

An accelerated proximal gradient algorithm for nuclear norm regularized linear least squares problems

An accelerated proximal gradient algorithm for nuclear norm regularized linear least squares problems Kim-Chuan Toh Sangwoon Yun March 27, 2009; Revised, Nov 11, 2009 Abstract The affine rank minimization