S 1/2 Regularization Methods and Fixed Point Algorithms for Affine Rank Minimization Problems

Size: px
Start display at page:

Download "S 1/2 Regularization Methods and Fixed Point Algorithms for Affine Rank Minimization Problems"

Transcription

1 S 1/2 Regularization Methods and Fixed Point Algorithms for Affine Rank Minimization Problems Dingtao Peng Naihua Xiu and Jian Yu Abstract The affine rank minimization problem is to minimize the rank of a matrix under linear constraints. It has many applications in various areas such as statistics, control, system identification and machine learning. Unlike the literatures which use the nuclear norm or the general Schatten q (0 < q < 1) quasi-norm to approximate the rank of a matrix, in this paper we use the Schatten 1/2 quasi-norm approximation which is a better approximation than the nuclear norm but leads to a nonconvex, nonsmooth and non-lipschitz optimization problem. It is important that we give a globally necessary optimality condition for the S 1/2 regularization problem by virtue of the special objective function. This is very different from the local optimality conditions usually used for the general S q regularization problems. Explicitly, the globally optimality condition for the S 1/2 regularization problem is a fixed point equation associated with the singular value half thresholding operator. Naturally, we propose a fixed point iterative scheme for the problem. We also provide the convergence analysis of this iteration. By discussing the location and setting of the optimal regularization parameter as well as using an approximate singular value decomposition procedure, we get a very efficient algorithm, half norm fixed point algorithm with an approximate SVD (HFPA algorithm), for the S 1/2 regularization problem. Numerical experiments on randomly generated and real matrix completion problems are presented to demonstrate the effectiveness of the proposed algorithm. Key words. affine rank minimization problem; matrix completion problem; S 1/2 regularization problem; fixed point algorithm; singular value half thresholding operator AMS Subject Classification. 90C06, 90C26, 90C59, 65F22 1 Introduction The affine rank minimization problem, which is to minimize the rank of a matrix under linear constraints, can be described as follows min X R m n rank(x) s.t. A(X) = b, (1.1) College of Science, Guizhou University, Guiyang , Guizhou, China; and School of Science, Beijing Jiaotong University, Beijing , China; (dingtaopeng@126.com). This author was supported by the NSFC grant and the Guizhou Provincial Science and Technology Foundation grant School of Science, Beijing Jiaotong University, Beijing , China (nhxiu@bjtu.edu.cn). This author was supported by the National Basic Research Program of China grant 2010CB and the NSFC grant College of Science, Guizhou University, Guiyang , Guizhou, China (sci.jyu@gzu.edu.cn). 1

2 where b R p is a given vector and A : R m n R p is a given linear transformation determined by p matrices A 1,, A p R m n via A(X) := [ A 1, X,, A p, X ] T for all X R m n, with A i, X := trace(a T i X), i = 1,, p. An important special case of (1.1) is the matrix completion problem [6] min X R m n rank(x) s.t. X i,j = M i,j, (i, j) Ω, (1.2) where X and M are both m n matrices, Ω is a subset of index pairs (i, j), and a small subset {M i,j (i, j) Ω} of the entries is known. Many applications arising in various areas can be captured by solving the model (1.1), for instance, the low-degree statistical models for a random process [17, 36], the low-order realization of linear control systems [19, 37], low-dimensional embedding of datum in Euclidean spaces [20], system identification in engineering [28], machine learning [32], and other applications [18]. The matrix completion problem (1.2) often arises, for which the examples include the Netflix problem, global positioning, remote sensing and so on [5, 6]. Moreover, problem (1.1) is an extension of the well-known sparse signal recovery (or compressed sensing) which is formulated as finding a sparsest solution of an underdetermined system of linear equations [7, 15]. Problem (1.1) was considered by Fazel [18] in which its computational complexity is analyzed and it is proved to be an NP-hard problem. To overcome such a difficulty, Fazel [18] and other researchers (e.g., [6, 8, 34]) have suggested to relax the rank of X by the nuclear norm, that is, to consider the following nuclear norm minimization problem min X R m n or the nuclear norm regularization problem X s.t. A(X) = b, (1.3) min X R m n A(X) b λ X (1.4) if the data contain noises, where X is the nuclear norm of X, i.e., the sum of its singular values. It is well-known that problems (1.3) and (1.4) are both convex and therefore, can be easier (at least in theory) solved than (1.1). Many existing algorithms rely on nuclear norm. For examples, problem (1.3) can be reformulated as a semidefinite programming [34] and be solved by SDPT3 [41]; Lin et al [26], and Tao and Yuan [39] adopt augmented lagrangian multipliers (ALM) methods to solve robust PCA problems and its extension which contain the matrix completion problem as a special case; The SVT [4] solves (1.3) by applying a singular value thresholding operator; Toh and Yun [40] solve a general model that contains (1.3) as a special case by accelerated proximal gradient (APG); Liu, Sun and Toh [27] present a framework of proximal point algorithms in the primal, dual and primal-dual forms for solving the nuclear norm minimization with linear equality and second order cone constraints; Ma, Goldfarb and Chen [29] proposed fixed point and Bregman iterative algorithms for solving problem (1.3). Considering the nonconvexity of the original problem (1.1), some researchers [23, 25, 30, 31, 33] suggest to use the Schatten q (0 < q < 1) quasi-norm (for short, q norm) relaxation, that is, to solve the q-norm minimization problem min X R m n X q q s.t. A(X) = b, (1.5) 2

3 or the S q regularization problem min X R m n A(X) b λ X q q (1.6) if the data contain noises, where the Schatten q quasi-norm of X is defined by X q q := min{m,n} i=1 σ q i and σ i (i = 1,, min{m, n}) are the singular values of X. Problems (1.5) is intermediate between (1.1) and (1.3) in the sense that rank(x) = min{m,n} i = 1 σ i 0 σ 0 i, X q q = min{m,n} i=1 σ q i, and X = min{m,n} Obviously, the q quasi-norm is a better approximation of the rank function than the nuclear norm, but it leads to a nonconvex, nonsmooth, non-lipschitz optimization problem for which the global minimizers are difficult to find. In fact, the nonconvex relaxation method was firstly proposed in the region of sparse signal recovery [9,10]. Recently, nonconvex regularization methods associated with the l q (0 < q < 1) norm have attracted much attention and many theoretical results and algorithms have been developed to solve the nonconvex, nonsmooth, even non-lipschitz optimization problems, see, e.g., [2, 14, 23, 25, 31]. Extensive computational results have shown that using the l q norm can find very sparse solutions under very little measurements, see, e.g., [9 14, 25, 31, 36, 45]. However, since the l q norm minimization is a nonconvex, non-smooth and non-lipschitz problem, it is in general difficult to give a theoretical guarantee for finding a global solution. Moreover, which q should be selected is another interesting problem. The results in [43 45] revealed that the l 1/2 relaxation can be somehow regarded as a representation among all the l q relaxation with q in (0, 1) in the sense that the l 1/2 relaxation has more powerful recovering ability than the l q relaxation as 1/2 < q < 1, meanwhile the recovering ability has no much difference between the l 1/2 relaxation and the l q relaxation as 0 < q < 1/2. Moreover, Xu et al [44] in fact provided a globally necessary optimality condition for the l 1/2 regularization problem, which is expressed as a fixed point equation involving the half thresholding function. This condition may not hold at the local minimizers. Then they developed a fast iterative half thresholding algorithm for the l 1/2 regularization problem which matches the iterative hard thresholding algorithm for the l 0 regularization problem and the iterative soft thresholding algorithm for the l 1 regularization problem. In this paper, inspired by the works of nonconvex regularization method, especially l 1/2 regularization mentioned above, we focus our attention on the following S 1/2 regularization problem { } min AX b 2 X R m n 2 + λ X 1/2 1/2, (1.7) where X 1/2 1/2 = min{m,n} i=1 σ 1/2 i and σ i (i = 1,, min{m, n}) are the singular values of X. This paper is organized as follows. In Section 2, we briefly discuss the relation bween the global minimizers of problem (1.5) and problem (1.6). In Section 3, we deduce an analytical thresholding expression associated with the solutions to problem (1.7), and establish an exact lower bound of the nonzero singular values of the solutions. Moveover, we prove that the solutions to problem (1.7) are fixed points of a matrix-valued thresholding operator. In Section 4, based on the fixed point condition, we give a naturally iterative formula, and provide the convergence analysis of our proposed iteration. Section 5 discusses the location of the optimal regularization parameter and the setting of the parameter which coincides with 3 i=1 σ 1 i.

4 the fixed point continuation technique used in the convex optimization. Since the singular value decomposition is computationally expensive, in Section 6 we employ an approximate singular value decomposition procedure to cut the cost of time. Thus we get a very fast, robust and powerful algorithm which we call HFPA algorithm (half norm fixed point algorithm with an approximate SVD). Numerical experiments on randomly generated and real matrix completion problems are presented in Section 7 to demonstrate the effectiveness of the HFPA algorithm. At last, we conclude our results in section 8. Before continuing, we summarize the notations that will be used in this paper. Throughout this paper, without loss of generality, we always suppose m n. Let x 2 denote the Euclidean norm of any vector x R p. For any x, y R p, x, y = x T y denotes the inner product of two vectors. For any matrix X R m n, σ(x) = (σ 1 (X),, σ m (X)) T denotes the vector of singular values of X arranged in nonincreasing order, and it will be simply denoted by σ = (σ 1,, σ m ) T if no confusion is caused in the context; Diag(σ(X)) denotes a diagonal matrix whose diagonal vector is σ(x); and X F denotes the Frobenius norm of X, i.e., X F = ( ) 1/2 ( i,j X2 ij = m ) 1/2. i=1 σ2 i For any X, Y R m n, X, Y = tr(y T X) denotes the inner product of two matrices. Let the linear transformation A : R m n R p be determined by p given matrices A 1,, A p R m n, that is, A(X) = ( A 1, X,, A p, X ) T. Define A = (vec(a 1 ),, (vec(a p )) T R p mn and x = vec(x) R mn where vec( ) is the stretch operator, then we have A(X) = Ax and A(X) 2 A X F, where A := max{ A(X) 2 : X F = 1} = A 2 and A 2 is the spectral norm of the matrix A. Let A denote the adjoint of A. Then for any y R p, we have A y = p i=1 y ia i and A(X), y = X, A y = vec(x), vec(a y) = x, A T y. 2 Relation between global minimizers of problem (1.5) and problem (1.6) We now show that in some sense, problem (1.5) can be solved by solving problem (1.6). The theorem here is general and covers problem (1.7) as a special case. We note that the regularization term X q q is nonconvex, nonsmooth and non-lipschitz, hence the result is nontrivial. Theorem 2.1 For each λ > 0, the set of global minimizers of (1.6) is nonempty and bounded. Let {λ k } be a decreasing sequence of positive numbers with λ k 0, and X λk be a global minimizer of problem (1.6) with λ = λ k. Suppose that problem (1.5) is feasible, then {X λk } is bounded and any of its accumulation points is a global minimizer of problem (1.5). Proof. Since C λ (X) := A(X) b λ X q q λ X q q, the objective function C λ (X) is bounded from below and is coercive, i.e., C λ (X) if X F, and hence the set of global minimizers of (1.6) is nonempty and bounded. Suppose that problem (1.5) is feasible and X is any feasible point, then A X = b. Since is a global minimizer of problem (1.6) with λ = λ k, we have X λk max { λ k X λk q q, A(X) λk b 2 2} λk X λk q q + A(X) λk b 2 2 λ k X q q + A X b 2 2 = λ k X q q. From λ k X λk q q λ k X q q, we get X λk q q X q q, that is, the sequence {X λk } is bounded. Thus, {X λk } has at least one accumulation point. Let X be any accumulation point of 4

5 {X λk }. From A(X) λk b 2 2 λ k X q q and λ k 0, we derive A(X) = b, that is, X is a feasible point of problem (1.5). It follows from X λk q q X q q that X q q X q q. Then by the arbitrariness of X, we obtain that X is a global minimizer of problem (1.5). 3 Globally necessary optimality condition In this section, we give a globally necessary optimality condition for problem (1.7), which perhaps does not hold at the local minimizers. This condition is expressed as a matrixvalued fixed point equation associated with a special thresholding operator which we called half thresholding operator. Before we start to research the S 1/2 regularization problem, we begin with introducing the half thresholding operator. 3.1 Half thresholding operator First, we introduce the half thresholding function, which is to minimize a real-valued function. The following key lemma follows but is different from Xu et al [44]. Lemma 3.1 Let t R, λ > 0 be two given real numbers. Suppose that x R is a global minimizer of the problem min f(x) := (x x 0 t)2 + λx 1/2. (3.1) Then x is uniquely determined by (3.1) when t 4 λ2/3, and can be analytically expressed by h λ,1/2 (t), if t > x 4 λ2/3 = h λ (t) := {h λ,1/2 (t), 0}, if t = 4 λ2/3 (3.2) 0, if t < 4 λ2/3 where h λ,1/2 (t) = 2 ( 2π (1 3 t + cos 3 2 )) 3 ϕ λ(t) (3.3) with ϕ λ (t) = arccos ( λ 8 ( ) ) t 3/2. (3.4) 3 Proof. Firstly, we consider the positive stationary points of (3.1). The first order optimal condition of (3.1) gives that x t + λ 4 = 0. (3.5) x This equation implies that if and only if t > 0 it has positive roots, and that if t 0, f(x) is increasing on [0, + ) and x = 0 is the unique minimizer of (3.1). Hence, we need only to consider t > 0 from now on. By solving equation (3.5) and comparing the values of f at each root of equation (3.5), Xu et al [44] have showed that x = h λ,1/2 (t) defined by (3.3) is the unique desired positive stationary point of (3.1) such that f( x) is the smallest among the 5

6 values of f at its all positive stationary points (see (14),(15) and (16) in [44], we note that, in (16), x i > 3 4 λ2/3 is not necessary, in fact, x i > 0 is enough). The rest thing is to compare the values between f( x) and f(0). Fortunately, Xu et al (see Lemma 1 and Lemma 2 in [44]) have showed that and The other case is naturally f( x) < f(0) t > f( x) = f(0) t = 4 λ2/3 4 λ2/3. f( x) > f(0) t < 4 λ2/3. The above three relationships imply x, if t > x 4 λ2/3 = { x, 0}, if t = 4 λ2/3, 0, if t < 4 λ2/3 which completes the proof. Figure 1 shows the minimizers of the function f(x) with two different pairs of (t, λ). Specifically, in (a) t = 2, λ = 8 and in (b) t = 4, λ = 8. In (b), x = 0 is a local minimizer of f(x); Meanwhile, since t = 4 > 4 λ2/3 =, the global minimizer is x = h λ,1/2 (4) > t =2, λ= t =4, λ= f(x) f(x) x (a) x (b) Figure 1: The minimizers of the function f(x) with two different pairs of (t, λ). Lemma 3.2 (Appendix A in [44]) If t > 4 λ2/3, then the function h λ (t) is strictly increasing. Similar to [33, 44], using h λ ( ) defined in Lemma 3.1, we can define the following half thresholding function and half thresholding operators. 6

7 Definition 3.3 (Half thresholding function) Assume t R. For any λ > 0, the function h λ ( ) defined by (3.2)-(3.4) is called a half thresholding function. Definition 3.4 (Vector half thresholding operator) For any λ > 0, the vector half thresholding operator H λ ( ) is defined as H λ (x) := (h λ (x 1 ), h λ (x 2 ),, h λ (x n )) T, x R n. Definition 3.5 (Matrix half thresholding operator) Suppose Y R m n of rank r admits a singular value decomposition (SVD) as Y = UDiag(σ)V T, where U and V are respectively m r and n r matrices with orthonormal columns, and the vector σ = (σ 1, σ 2,, σ r ) T consists of positive singular values of Y arranged in nonincreasing order (Unless specified otherwise, we will always suppose the SVD of a matrix is given in this reduced form). For any λ > 0, the matrix half thresholding operator H λ ( ) : R m n R m n is defined by H λ (Y ) := UDiag(H λ (σ))v T. In what follows, we will see that the matrix half thresholding operator defined above is in fact a proximal operator associated with X 1/2 1/2, a nonconvex and non-lispschitz function. This in some sense can be regarded as an extension of the well-known proximal operator associated with convex functions [27, 35]. Lemma 3.6 The global minimizer X s of the following problem can be analytically given by min X Y X R 2 m n F + λ X 1/2 1/2 (3.6) X s = H λ (Y ). Proof. See the Appendix. 3.2 Fixed point equation for global minimizers Now, we can begin to consider our S 1/2 regularization problem (1.7): { } min AX b 2 X R m n 2 + λ X 1/2 1/2. (3.7) For any λ, µ > 0 and Z R m n, let C λ (X) := A(X) b λ X 1/2 1/2, (3.8) C λ,µ (X, Z) := µ ( C λ (X) A(X) A(Z) 2 2) + X Z 2 F, (3.9) B µ (Z) := Z + µa (b A(Z)). (3.10) Lemma 3.7 If X s R m n is a global minimizer of C λ,µ (X, Z) for any fixed λ, µ and Z, then X s can be analytically expressed by X s = H λµ (B µ (Z)). (3.11) 7

8 Proof. Note that C λ,µ (X, Z) can be reexpressed as ( ) C λ,µ (X, Z) = µ A(X) b λ X 1/2 1/2 A(X) A(Z) X Z 2 F = X 2 F + 2µ A(X), A(Z) 2µ A(X), b 2 X, Z + λµ X 1/2 1/2 + Z 2 F + µ b 2 2 µ A(Z) 2 2 = X 2 F 2 X, Z + µa (b A(Z)) + λµ X 1/2 1/2 + Z 2 F + µ b 2 2 µ A(Z) 2 2 = X 2 F 2 X, B µ (Z) + λµ X 1/2 1/2 + Z 2 F + µ b 2 2 µ A(Z) 2 2. = X B µ (Z) 2 F + λµ X 1/2 1/2 + Z 2 F + µ b 2 2 µ A(Z) 2 2 B µ (Z) 2 F. This implies that minimizing C λ,µ (X, Z) for any fixed λ, µ and Z is equivalent to solving { min X R m n X B µ (Z) 2 F + λµ X 1/2 By applying Lemma 3.6 with Y = B µ (Z), we get expression (3.11). 1/2 }. Lemma 3.8 Let λ and µ be two fixed numbers satisfying λ > 0 and 0 < µ A 2. If X is a global minimizer of C λ (X), then X is also a global minimizer of C λ,µ (X, X ), that is, C λ,µ (X, X ) C λ,µ (X, X ) for all X R m n. (3.12) Proof. Since 0 < µ A 2, we have Hence for any X R m n, X X 2 F µ AX AX C λ,µ (X, X ) = µ ( C λ (X) AX AX 2 2) + X X 2 F ( ) = µ AX b λ X 1/2 1/2 + ( X X 2 F µ AX AX 2 ) 2 ( ) µ AX b λ X 1/2 1/2 = µc λ (X) µc λ (X ) = C λ,µ (X, X ), where the last inequality is due to that X is a global minimizer of C λ (X). The proof is thus complete. By applying Lemmas 3.7 and 3.8, we can now derive the main result of this section. Theorem 3.9 Given λ > 0, 0 < µ A 2. Let X be a global minimizer of problem (1.7) and B µ (X ) = X + µa (b A(X) ) admit the following SVD Then X satisfies the following fixed point equation B µ (X ) = U Diag(σ(B µ (X )))V T. (3.13) X = H λµ (B µ (X )). (3.14) 8

9 Particularly, one can express [σ(x )] i = h λµ ([σ(b µ (X ))] i ) h λµ,1/2 ([σ(b µ (X ))] i ), if [σ(b µ (X ))] i > 4 (λµ)2/3 = {h λµ,1/2 ([σ(b µ (X ))] i ), 0}, if [σ(b µ (X ))] i = 4 (λµ)2/3 0, if [σ(b µ (X ))] i < 4 (λµ)2/3. Moreover, we have either [σ(x )] i (3.15) 6 (λµ)2/3 or [σ(x )] i = 0. (3.16) Proof. Since X is a global minimizer of C λ (X), by Lemma 3.8, X is also a global minimizer of C λ,µ (X, X ). Consequently by Lemma 3.7, X satisfies equation (3.14). (3.15) is a reexpression of equation (3.14) in the form of component. According to (3.2)-(3.4), by direct computation, we have π lim ϕ λµ (t) = t 4 4 (λµ)2/3 and lim t 4 (λµ)2/3 h λµ (t) = 6 (λµ)2/3. This limitation together with the strict monotonicity of h λµ on t > 4 (λµ)2/3 (Lemma 3.2) implies that [σ(x )] i > 6 (λµ)2/3 as [σ(b µ (X ))] i > 4 (λµ)2/3. The last one of (3.15) shows [σ(x )] i = 0 as [σ(b µ (X ))] i < 4 (λµ)2/3. Thus, (3.16) is derived. Theorem 3.10 provides not only the lower bound estimation, say 6 (λµ)2/3, for the nonzero singular values of the global minimizers of the S 1/2 regularization problem, but also a global necessary optimality condition in the form of a fixed point equation associated with the matrix half thresholding operator H λµ ( ). In one hand, it is analogous to the fixed point condition of the nuclear norm regularization solution associated with the so-called singular value shrinkage operator (see, e.g., [4,29]). On the other hand, the half thresholding operator associated here is more complicated than the singular value shrinkage operator due to our nonconvex, nonsmooth and non-lipschitz minimization problem. Definition 3.10 We call X a global stationary point of problem (1.7) if there exists 0 < µ A 2 such that X satisfies the fixed point equation (3.14). 4 Fixed point iteration and its convergence According to the fixed point equation (3.14), a fixed point iterative formula for the S 1/2 regularization problem (1.7) can be naturally proposed as follows: given X 0, X k+1 = H λµ (X k + µa (b A(X) k )). (4.1) To simplify the process of iterations and for the aim to find low rank solutions, we make a slightly adjustment of h λµ in (4.1) as follows { h h λµ (t) := λµ,1/2 (t), if t > 4 (λµ)2/3 (4.2) 0, otherwise. The adjustment here is to choose h λµ (t) = 0 when t = 4 (λµ)2/3. Next, let us analyze the convergence of the above fixed point iteration. 9

10 Theorem 4.1 Given λ > 0, choose 0 < µ < A 2. Let {X k } be the sequence generated by iteration (4.1). Then (i) {C λ (X k )} is strictly monotonically decreasing and converges to C λ (X ) where X is any accumulation point of {X k }. (ii) {X k } is asymptotically regular, that is, lim k X k+1 X k F = 0. (iii) Any accumulation point of {X k } is a global stationary point of problem (1.7). Proof. (i) Let C λ (X), C λ,µ (X, Z) and B µ (Z) be defined by (3.8)-(3.10), and B µ (Z) admit the SVD as B µ (Z) = UDiag(σ)V T where U R m r, V R n r and σ R r ++. From Lemma 3.7, we have and therefore, C λ,µ (H λµ (B µ (Z)), Z) = min C λ,µ(x, Z), X C λ,µ (X k+1, X k ) = min X C λ,µ(x, X k ), (4.3) where X k+1 = H λµ (B µ (X k )) = U k Diag(H λµ (σ k ))Vk T B µ (X k ). Since 0 < µ < A 2, we have and U k Diag(σ k )V T k is the SVD of Hence, A(X) k+1 A(X) k µ X k+1 X k 2 F < 0. C λ (X k+1 ) = 1 ( ) C λ,µ (X k+1, X k ) X k+1 X k 2 F + A(X) k+1 A(X) k 2 2 µ 1 ( ) C λ,µ (X k, X k ) X k+1 X k 2 F + A(X) k+1 A(X) k 2 2 µ = 1 ( µ C λ,µ(x k, X k ) + A(X) k+1 A(X) k ) µ X k+1 X k 2 F < 1 µ C λ,µ(x k, X k ) = C λ (X k ), which shows that {C λ (X k )} is strictly monotonically decreasing. Since {C λ (X k )} is bounded from below, {C λ (X k )} converges to a constant C. From {X k } {X : C λ (X) C λ (X 0 )} which is bounded, it follows that {X k } is bounded, and therefore {X k } has at least one accumulation point. Let X be an accumulation point of {X k }. By the continuity of C λ (X) and the convergence of {C λ (X k )}, we get C λ (X k ) C = C λ (X ) as k +. (ii) Since 0 < µ < A 2, we have 0 < δ := 1 µ A 2 < 1 and X k+1 X k 2 F 1 δ From (3.8), (3.9) and (4.3), we derive ( Xk+1 X k 2 F µ A(X) k+1 A(X) k 2 ) 2. µ[c λ (X k ) C λ (X k+1 )] = C λ,µ (X k, X k ) µc λ (X k+1 ) C λ,µ (X k+1, X k ) µc λ (X k+1 ) = X k+1 X k 2 F µ A(X) k+1 A(X) k

11 The above two inequalities yield that, for any positive integer K, Hence, K X k+1 X k 2 F 1 δ k=0 µ δ K ( Xk+1 X k 2 F µ A(X) k+1 A(X) k 2 2) k=0 K (C λ (X k ) C λ (X k+1 )) k=0 = µ δ (C λ(x 0 ) C λ (X K+1 )) µ δ C λ(x 0 ). X k+1 X k 2 F < +, and so X k+1 X k F 0 as k +. Thus, {X k } is k=0 asymptotically regular. (iii) Let {X kj } be a convergent subsequence of {X k } and X be its limit point, i.e., From the above limitation, we derive X kj X, as k j +. (4.4) B µ (X kj ) = X kj + µa (b A(X kj )) X + µa (b A(X )) = B µ (X ), as k j +, i.e., U kj Diag(σ kj )V T k j U Diag(σ )V T, as k j +, (4.5) where B µ (X kj ) = U kj Diag(σ kj )V T k j and B µ (X ) = U Diag(σ )V T are the SVDs of B µ (X kj ) and B µ (X ) respectively. According to (4.5) and [22, Corollary 7.3.8], we have [σ kj ] i [σ ] i for each i = 1, r, as k j +, (4.6) where r is the rank of B µ (X ). By the selection principle (see, e.g., [22, Lemma 2.1.8]), we can suppose that U kj Ū, Diag(σ k j ) Diag(σ ), V kj V, as k j +, (4.7) for some Ū Rm r and V R n r both with orthonormal columns. From (4.7), we get U kj Diag(σ kj )Vk T j ŪDiag(σ ) V T. This together with (4.5) implies ŪDiag(σ ) V T = U Diag(σ )V T = B µ (X ). (4.8) The limitation (4.4) and the asymptotical regularity of {X k } imply X kj +1 X F X kj +1 X kj F + X kj X F 0, as k j +, which verifies that {X kj +1} also converges to X. Note that X kj +1 = U kj Diag(H λµ (σ kj ))V T k j, which together with X kj +1 X yields If there holds U kj Diag(H λµ (σ kj ))V T k j X, as k j +. (4.9) h λµ ([σ kj ] i ) h λµ ([σ ] i ) for each i = 1, 2, r, as k j +, (4.10) 11

12 then from (4.7), (4.10) and (4.8), we get U kj Diag(H λµ (σ kj ))V T k j ŪDiag(H λµ(σ )) V T = H λµ (B µ (X )) as k j +, where the last equality is due to the well-definedness 1 of H λµ ( ). The above limitation as well as (4.9) gives X = H λµ (B µ (X )), that is, X is a global stationary point of problem (1.7). The rest thing is to prove (4.10) to be true. For i = 1,, r, if [σ ] i < 4 (λµ)2/3, then by (4.6), [σ kj ] i < 4 (λµ)2/3 when k j is sufficiently large. This inequality as well as the definition of h λµ in (4.2) gives h λµ ([σ kj ] i ) = 0 h λµ ([σ ] i ) = 0, as k j +. If [σ ] i > 4 (λµ)2/3, then by (4.6), [σ kj ] i > 4 (λµ)2/3 when k j is sufficiently large. Note ( that although h λµ ( ) defined by (4.2) is not continuous on [0, + ), it is continuous in 3 ) 4 (λµ)2/3, +. So it follows from [σ kj ] i [σ ] i that h λµ ([σ kj ] i ) h λµ ([σ ] i ), as k j +. If [σ ] i = 4 (λµ)2/3, since [σ kj ] i [σ ] i, there are two possible cases: Case 1: There is a subsequence of {[σ kj ] i }, say {[σ kjm ] i }, converging to [σ ] i such that [σ kjm ] i [σ ] i for each k jm. In this case, we have h λµ ([σ kjm ] i ) = 0 h λµ ([σ ] i ) = 0, as k jm +. Case 2: There is a subsequence of {[σ kj ] i }, say {[σ kj n ] i}, converging to [σ ] i such that [σ kj n ] i > [σ ] i = 4 (λµ)2/3 for each k jn. However, we will verify this case can never happen as long as µ is chosen appropriately. If Case 2 happens,there is a large integer N 1 such that [σ kjn ] i ( 3 4 (λµ)2/3, 3 (λµ)2/3) holds for any k jn N 1. By (ii), X kj n +1 X kj n F 0 as k jn +. Then there is a large integer N 2 N 1 such that [σ kj n +1 ] i ( 3 4 (λµ)2/3, 3 (λµ)2/3) (4.11) 1 The matrix half thresholding operator H λµ : R m n R m n here is in fact a non-symmetric Löwner s operator [38] associated with the half thresholding function h λµ : R R. The non-symmetric Löwner s operator H λµ : R m n R m n is called well-defined if it is independent of the choice of the matrices U and V in the SVD. In other words, if Y R m n has two different SVDs such as Y = U 1Diag(σ)V T 1 = U 2Diag(σ)V T we have H λµ (Y ) = U 1 Diag(h λµ (σ 1 ),, h λµ (σ m ))V1 T = U 2 Diag(h λµ (σ 1 ),, h λµ (σ m ))V2 T. Theorem 1 of Lecture III in [38] proves that a non-symmetric Löwner s operator H : R m n R m n associated with a scalar valued function h : R + R + is well-defined if and only if h(0) = 0. By this theorem, our matrix half thresholding operator H λµ is well-defined since h λµ (0) = ,

13 holds for any k jn N 2. On the other hand, since B µ (X kjn ) = X kjn + µa (b A(X kjn )) is continuous in µ and B µ (X kjn ) X kjn if µ 0, we know that if µ is chosen sufficiently small, [σ(b µ (X kjn ))] i will be closed to [σ kjn ] i. Let µ be chosen such that [σ(b µ (X kj n ))] i ( 3 4 (λµ)2/3, 3 (λµ)2/3) holds for any k jn N 2. According to (3.2)-(3.4), by direct computation, we know π lim ϕ λµ (t) = t 4 4 (λµ)2/3 and lim t 4 (λµ)2/3 h λµ (t) = 6 (λµ)2/3. Note that [σ kj n +1 ] i = h λµ ([σ(b µ (X kj n ))] i) and h λµ ( ) is increasing in ( 4 (λµ)2/3, + ) (Lemma 3.2), then there is a large integer N 3 N 2 such that [σ kjn +1] i = h λµ ([σ(b µ (X kjn ))] i ) ( 3 6 (λµ)2/3, 4 (λµ)2/3) (4.12) holds for any k jn N 3. One can find that (4.12) is in contradiction with (4.11). This contradiction shows that Case 2 will never happen as long as µ is chosen appropriately. Therefore, we have shown (4.10) is true. The proof is thus complete. 5 Setting of parameters and fixed point contiuation In this section, we discuss the problem of parameter selection in our algorithm. As we all know, the quality of solutions to optimization problems depends seriously on the setting of regularization parameter λ. But the selection of proper parameters is a very hard problem. There is no optimal rule in general. Nevertheless, when some prior information (e.g., low rank) is known for a problem, it is realistic to set the regularization parameter more reasonably. 5.1 Location of the optimal regularization parameter We begin with finding the location of the optimal λ, which then serves as the basis of the parameter setting strategy used in the algorithm to be proposed. Specifically, suppose that a problem can be formulated as an S 1/2 regularization form (1.7), whose solutions are the matrices of rank r. Thus, we are required to solve the S 1/2 regularization problem restricted to the subregion r = {X Rm n rank(x) = r}. For any µ, denote by B µ (X) = X + µa (b A(X)). Assume X is a solution to the S 1/2 regularization problem and σ(b µ (X )) is arranged in nonincreasing order. By Theorem 3.9 (particularly (3.16)) and (4.2), we have [σ(b µ (X ))] i > 4 (λ µ) 2/3 [σ(x )] i > 6 (λ µ) 2/3 i {1, 2,, r} and [σ(b µ (X ))] i 4 (λ µ) 2/3 [σ(x )] i = 0 i {r + 1, r + 2,, n}, 13

14 which implies 96 9µ ([σ(b µ(x ))] r+1 ) 3/2 96 λ < 9µ ([σ(b µ(x ))] r ) 3/2. The above estimation provides an exact location of where the optimal parameter should be. We can then take 96 λ = ((1 α) ([σ(b µ (X ))] r+1 ) 3/2 + α ([σ(b µ (X ))] r ) 3/2) 9µ with any α [0, 1). Especially, a most reliable choice of λ is λ = 96 9µ ([σ(b µ(x ))] r+1 ) 3/2. (5.1) Of course, it may not be the best choice since we should note that the larger λ, the larger threshold value 4 (λ µ) 2/3, and the lower rank of the solution resulted by the thresholding algorithm. We also note that formula (5.1) is valid for any fixed µ. We will use it with a fixed µ 0 satisfying 0 < µ 0 < A 2 below. In applications, we may use X k instead of the real solution X and the rank of X k instead of r + 1, that is, we can take 96 λ k+1 = ([σ(x k )] rk ) 3/2, (5.2) 9µ 0 where r k is the rank of X k. More often, we can also take { { }} 96 λ k+1 = max λ, min ηλ k, ([σ(x k )] rk ) 3/2, (5.3) 9µ 0 where λ is a sufficiently small but positive real number, and η (0, 1) is a constant and r k is the rank of X k. In this case, {λ k } can keep monotonically decreasing. In next subsection, one will see that (5.3) may result an acceleration of the iteration. 5.2 Interpretation as a method of fixed point continuation In this subsection, we recast (5.3) as a continuation technique (i.e., homotopy approach) which accelerates the convergence of the fixed point iteration. In [21], Hale et al. describe a continuation technique to accelerate the convergence of the fixed point iteration for the l 1 regularization problem. Inspired by this work, Ma et al. [29] provide a similar continuation technique to accelerate the convergence of the fixed point iteration for the nuclear norm regularization problem. As shown in [21, 29], this continuation technique improves considerably the convergence speed of fixed point iterations. The main idea in their continuation technique, explained in our context, is to choose a decreasing sequence {λ k } : λ 1 > λ 2 > > λ L = λ > 0, then in the kth iteration, use λ = λ k. Therefore, formula (5.3) coincides with this continuation technique. Generally speaking, our algorithm can be regarded as a fixed point continuation algorithm, but is implemented to a nonsmooth, nonconvex and non-lipschitz optimization problem. Thus, a fixed point iterative algorithm based on the half norm of matrices for problem (1.7) can be specified as follows. 14

15 Algorithm 5.2. Half Norm Fixed Point algorithm (HFP algorithm) Given the linear operator A : R m n R p and the vector b R p ; Set the parameters µ 0 > 0, λ > 0 and η (0, 1). - Initialize: Choose the initial values {X 0, λ 1 } with λ 1 λ, set X = X 0 and λ = λ 1. - for k = 1 : maxiter, do λ = λ k, -while NOT converged, do Compute B = X + µ 0 A (b A(X)), and its SVD, say B = UDiag(σ)V T Compute X = UDiag(H λµ0 (σ))v T - end while, and output: { X k, σ k, r k =rank(x k ); - set λ k+1 = max { λ, min ηλ k, 96 9µ 0 ([σ(x k )] rk ) 3/2}} ; - if λ k+1 = λ, return; - end for In Algorithm 5.2, the positive integer maxiter is large enough that the convergence in outer loop can be ensured. 5.3 Stopping criteria for inner loops Note that in the half norm fixed point algorithm, in the kth inner loop we solve problem (1.7) for a fixed λ = λ k. We must determine when to stop this inner iteration and go to the next inner iteration. Since when X k gets close to an optimal solution X, the distance between X k and X k+1 should become very small. Hence, we can use the following criterion X k+1 X k F < xtol, (5.4) max{1, X k F } where xtol is a small positive number. Besides the above stopping criterion, we use I m to control the maximum number of the inner loops. i.e., if the stopping rule (5.4) is not satisfied after I m iterations, we terminate the subproblem and update λ to start the next subproblem. 6 HFPA algorithm: HFP algorithm with an approximate SVD In Algorithm 5.2, computing singular value decompositions is the main computational cost. Inspired by the works of Cai et al. [4] and Ma et al. [29], instead of computing the full SVD of the matrix B in each iteration, we implement a variant of HFP algorithm in which we compute only a rank-r approximation to B, where r is an estimator of the rank of the optimal solution. We call this half norm fixed point algorithm with an approximate SVD HFPA algorithm. This approach greatly reduces the computational effort required by the algorithm. Specifically, we compute an approximate SVD by a fast Monte Carlo algorithm: the Linear Time SVD algorithm developed by Drineas et al. [16]. For a given matrix A R m n, and parameters c s, k s Z + with 1 k s c s n, and {p i } n i=1, p i 0, n i=1 p i = 1, this algorithm returns an approximation to the largest k s singular values and corresponding left singular vectors of the matrix A in linear O(m + n) time. The Linear Time Approximate SVD Algorithm is outlined below. 15

16 Linear Time Approximate SVD Algorithm [16, 29] - Input: A R m n,c s, k s Z + s.t. 1 k s c s n, {p i } n i=1 s.t. p i 0, n i=1 p i = 1. - Output: H ks R m ks and σ t (C), t = 1, 2,, k s. - For t = 1 to c s, Pick i t {1, 2,, n} with Prob{i t = α} = p α, α = 1, 2,, n. Set C (t) = A (it) / c t p it. - Compute C T C and its SVD, say C T C = c s t=1 σ2 t (C)y t y t T. - Compute h t = Cy t /σ t (C) for t = 1, 2,, k s. - Return H ks, where H (t) k s = h t, and σ t (C), t = 1, 2,, k s. The outputs σ t (C) (t = 1, 2,, k s ) are approximations to the largest k s singular values and H (t) k s (t = 1, 2,, k) are approximations to the corresponding left singular vectors of the matrix A. Thus, the SVD of A is approximated by A A ks := H ks Diag(σ(C))(A T H ks Diag(1/σ(C) T ). Drineas et al. [16] prove that with high probability, A ks is an approximation to the best rank-k s approximation to A. [( In our numerical experiments, same as in [29], we set c s = 2r m 2, where r m = m + n ) ] (m + n) 2 4p /2 is, for a given number of entries sampled, the largest rank of m n matrices for which the matrix completion problem has a unique solution. We refer ro [29] for how to set k s. We also set all p i equal to 1/n. For more details for the choices of the parameters in the Linear Time Approximate SVD Algorithm, please see [16, 29]. The Linear Time Approximate SVD Code we will use is written by Shiqian Ma and is available at sm2756/fpca.htm. 7 Numerical experiments In this section, we report some numerical results on a series of matrix completion problems of the form (1.2) to demonstrate the performance of the HFPA algorithm. The purpose of numerical experiments is to assess the effectiveness, accuracy, robustness and convergence of the algorithm. The effectiveness is measured by how few measurements required to exactly recover a low-rank matrix. The fewer the measurements used by an algorithm, the better the algorithm. Under the same measurements, the shorter time used by an algorithm and the higher accuracy achieved by it, the better the algorithm. We will also test the robustness of the algorithm with respect to the varying dimensions, the varying ranks and the varying sampling ratios. To compare performance of finding low-rank matrix solutions, some other competitive algorithms such as the singular value thresholding algorithm (SVT 2 ) [4], the fixed point contiuation algorithm based on an approximate SVD using the iterative Lanczos algorithm (FPC 3 ) [29], the fixed point contiuation algorithm based on a linear time approximate SVD (FPCA 4 ) [29] have been also demonstrated together with our HFPA algorithm. Note that the former three algorithms are all based on the nuclear norm minimization, while these 2 The SVT code is available at which is written by Emmanuel Candès, October 2008, and last modified by Farshad Harirchi and Stephen Becker, April The FPC code is available at which is coded by Stephen Becker, March He referred to [29]. 4 The FPCA code is available at sm2756/fpca.htm, which is coded and modified by Shiqian Ma, July 2008 and April 2009, respectively. 16

17 four algorithms all depend on the approximate SVD. We also note that some manifold-based algorithms without SVD, such as GenRTR [1], RTRMC [3], OptSpace [24] and LMaFit [42], have good performances. Because of space constraints, we will not compare to them. All computational experiments were performed in MATLAB R2009a on a Dell desktop computer with an Intel(R) Core (TM) i GHZ CPU and 3.23GB of RAM. In our simulations, we will use the same way as used in relevant researches (for instance, [4, 6, 29]) to generate m n matrices of rank r. The procedure is that: we first generate random matrices M L R m r and M R R n r with i.i.d. Gaussian entries, and then set M = M L MR T. We then sampled a subset Ω of p entries uniformly at random. Thus, the entries of M on Ω are observed data and M is the real unknown matrix. For each problem with m n matrix M, measurement number p and rank r, we solved a fixed number of randomly created matrix completion problems. We use SR:=p/(mn), i.e., the number of measurements divided by the number of entries of the matrix, to denote the sampling ratio. Recall that an m n matrix of rank r depends upon df:=r(m + n r) degrees of freedom. Then OS:=p/df is the oversampling ratio, i.e., the ratio between the number of sampled entries and the true dimensionality of an m n matrix of rank r. Note that if OS< 1, then there is always an infinite number of matrices with rank r with the given entries, so we cannot hope to recover the matrix in this situation. We also note that when OS 1, the closer to 1 the OS is, the more difficult to recover the matrix. For this reason, following [29], we call a matrix completion problem a easy problem if OS and SR in this problem are such that OS SR>0.5 and OS > 2.6, equivalently a hard problem if OS SR 0.5 or OS 2.6. In the tables, FR := 1/OS = df/p is an often used quantity in literatures. We use rank to denote the average rank of matrices that are recovered by an algorithm. We use time and iter to denote the average time (seconds) and the average number of iterations respectively that an algorithm takes to reach convergence. We use three relative errors: the relative error on Ω, the relative recovery error in the Frobinus norm and the relative recovery error in the spectral norm rel.err(ω) := M(Ω) X opt(ω) F M(Ω) F, rel.err.f := M X opt F M F, rel.err.s := M X opt 2 M 2 to evaluate the closeness of X opt to M, where X opt is the optimal solution to (1.2) obtained by an algorithm. The parameters and initial values in HFPA algorithm for matrix completion problems are listed in Table 1. Table 1: Parameters and initial values in HFPA algorithm λ = 1e 4, η = 1/4, λ 1 = min{3, mn/p} A (b) 2, X 0 = A (b), maxiter = 10, 000, if {hard problem & max(m, n) < 1000} µ 0 = 1.7; I m = 200; else if {SR < 0.5 & min(m, n) 1000} µ 0 = 1.98; else µ 0 = 1.5; end; I m = 10; end 7.1 Results for randomly created noiseless matrix completion problems Our first experiment is to compare the recovering ability of HFPA to SVT, FPC and FPCA for small and easy matrix completion problems. Here a small matrix means that the dimension 17

18 of the matrix is less than 200, Specifically in the first experiment, we take m = n = 100, OS= 3, FR=0.33 and let the real rank increased from 6 to 16 per 1 increase. The tolerance in the four algorithms is set to be For each scale of these problems, we solve 10 randomly created matrix completion problems. The computational results for this experiment are displayed in Table 2. From Table 2, the first observation is that only HFPA recovers all the real ranks. When r < 11, the recovered ranks by SVT are larger than the real ranks; the recovered ranks by FPC are also larger than the real ones when r < 10; the same thing happens to FPCA as the real rank equal to 16. The second observation is that HFPA runs fastest among the four algorithms for most of the problems. As the real ranks change from 6 to 16, the time cost by HFPA is almost no change. Although when r 6, FPCA is slightly faster than HFPA in several percent seconds, HFPA is much faster than FPCA when r 12. Obviously, HFPA runs faster than SVT and FPC. At last, let us make a comparison among the accuracies achieved by the four algorithms. We can observe that HFPA achieves the most accurate solutions for most of the problems; even when r 12, at least one of the three relative errors by HFPA achieves 10 6 ; meanwhile the accuracies of SVT and FPC are not very good when r 7, and FPCA begins to yield very inaccurate solutions when r 13. We can draw a conclusion that for the small and easy matrix completion problems, HFPA is very fast, effective and robust. Our second experiment is to compare the recovering abilities of HFPA to SVT, FPC and FPCA for small but hard matrix completion problems. These problems are hard and challenging to recover because the oversampling ratio OS=2 is very close to 1, which implies that the observed data are very limited with respect to the freedom degree of the unknown matrices. In this experiment, we take m = n = 100, OS=2, FR=0.50 and let r increased from 2 to 24 per 2 increases. For this set of problems, SR ranges from 7.9% to 84.5%. The tolerance in this experiment is set to be For each scale of these problems, we also solve 10 randomly created matrix completion problems. The results are displayed in Table 3. From Table 3, we find that SVT and FPC cannot work well in the sense that the recovered ranks by them are far more than the real ranks and the accuracies of their solutions are poor until the real rank increases to 20. It is clear that FPCA and HFPA both work very well. We can observe that as r increases from 2 to 24, the time cost by HFPA and FPCA are both increasing but in slow speed. As we can see, HFPA shares the accuracy as good as or slightly better than FPCA, however the former is obviously faster than the later. Now we begin to test our algorithm for large randomly created matrix completion problems. We only run 5 times for each large scale problems. The numerical results of HFPA for easy and large matrix completion problems are presented in Table 4. For easy problems, since SVT performs in general better than FPC, we omit the results of FPC in Table 4 for the sake of limited spaces. For example, when m = n = 1000, r = 10, OS=6, SR=0.119 and FR=0.17, FPC costs more than 350 seconds to recover the matrix and SVT only costs about 8 seconds while they achieve the similar accuracy. From Table 4, we can see that for a unknown matrix of rank 200 with 38.7% sampling ratio, HFPA can well recover it in only 12 minutes, while SVT needs half an hour and FPCA fails to work. We also find that for a fixed scale unknown matrix, the decrease of sampling ratio has little influence on the computational time of HFPA, however the increase of sampling ratio can remarkably improve its accuracy. We can conclude that for these easy problems some of which have a very low rank and some of which have a low but not very low rank, HFPA is always powerful enough to recover them. For hard problems, without any exception SVT and FPC either diverge or cannot termi- 18

19 Table 2: Comparison of SVT, FPC, FPCA and HFPA for randomly created small and easy matrix completion problems (m = n = 100, r = 6 : 1 : 16, OS=3, FR=0.33, xtol=10 4 ) r SR Solver rank iter time rel.err(ω) rel.err.f rel.err.s SVT e e e FPC e e e-2 HFPA e e e-4 FPCA e e e-4 SVT e e e FPC e e e-2 HFPA e e e-4 FPCA e e e-4 SVT e e e FPC e e e-4 HFPA e e e-4 FPCA e e e-5 SVT e e e FPC e e e-2 HFPA e e e-5 FPCA e e e-5 SVT e e e FPC e e e-4 HFPA e e e-5 FPCA e e e-5 SVT e e e FPC e e e-4 HFPA e e e-5 FPCA e e e-5 SVT e e e FPC e e e-4 HFPA e e e-5 FPCA e e e-5 SVT e e e FPC e e e-4 HFPA e e e-5 FPCA e e e-2 SVT e e e FPC e e e-4 HFPA e e e-6 FPCA e e e-2 SVT e e e FPC e e e-4 HFPA e e e-6 FPCA e e e-1 SVT e e e FPC e e e-5 HFPA e e e-6 FPCA e e e-1 19

20 Table 3: Comparison of SVT, FPC, FPCA and HFPA for randomly created small but hard matrix completion problems (m = n = 100, r = 2 : 2 : 24, OS=2, FR=0.50, xtol=10 6 ). r SR Solver rank iter time rel.err(ω) rel.err.f rel.err.s SVT Divergent FPC e e e-1 HFPA e e e-2 FPCA e e e-2 SVT e e e FPC e e e-1 HFPA e e e-6 FPCA e e e-5 SVT e e e FPC e e e-1 HFPA e e e-6 FPCA e e e-6 SVT e e e FPC e e e-1 HFPA e e e-6 FPCA e e e-6 SVT e e e FPC e e e-2 HFPA e e e-6 FPCA e e e-6 SVT e e e FPC e e e-2 HFPA e e e-6 FPCA e e e-6 SVT e e e FPC e e e-2 HFPA e e e-6 FPCA e e e-6 SVT e e e FPC e-4 3.e e-4 HFPA e e e-7 FPCA e e e-6 SVT e e e FPC e e e-4 HFPA e e e-7 FPCA e e e-6 SVT e e e FPC e e e-4 HFPA e e e-7 FPCA e e e-7 SVT e e e FPC e e e-4 HFPA e e e-7 FPCA e e e-7 SVT e e e FPC e e e-4 HFPA e e e-7 FPCA e e e-7 20

21 Table 4: Comparison of SVT, FPCA and HFPA for randomly created large and easy matrix completion problems (xtol=10 4 ). Problems SVT HFPA FPCA m=n r OS SR FR time rel.err.f time rel.err.f time rel.err.f e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e-5 Out of memory! Table 5: Comparison of FPCA and HFPA for randomly created large and hard matrix completion problems (xtol=10 4 ). Problems HFPA FPCA m=n r OS SR FR time rel.err(ω) rel.err.f time rel.err(ω) rel.err.f e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e e-4 21

22 nate in one hour, or yield very inaccurate solutions. For example, when m = n = 200, r = 10 and SR=0.195 which is the simplest case, SVT costs more than 300 seconds to recover a matrix of rank 43 with the relative error in Frobinus norm of 10 1, while FPC recovers a matrix of rank 69 with relative error in Frobinus norm of Another simple example is that when m = n = 200, r = 20 and SR=0.380, SVT costs more than 700 seconds to recover a matrix of rank 87 with relative error in Frobinus norm of 10 1, while FPC recovers a matrix of rank 96 with relative error in Frobinus norm of Therefore, in this case, only FPCA is comparable to HFPA. The results are displayed in Table 5. From Table 5, we can see that HFPA still has a powerful recovering ability for hard and large matrix completion problems. 7.2 Results for matrix completion from noisy sampled entries In this subsection, we simply demonstrate the results of HFPA for matrix completion problems from noisy sampled entries. Suppose we observe data from the following model B ij = M ij + Z ij, (i, j) Ω, (7.1) where Z is a zero-mean Gaussian white noise with standard deviation σ. The results of HFPA together with SVT and FPCA are displayed in Table 6. The quantities are averages of 5 runs. The tolerance is set to be From Table 6, we see that for the noisy sampled data HFPA performs as well as or slightly better than FPCA, while it is obviously more powerful than SVT. Table 6: Numerical results for SVT, HFPA and FPCA on random matrix completion problems with noisy noise Problems SVT HFPA FPCA level σ m=n r OS SR time rel.err.f time rel.err.f time rel.err.f e e e e e e e e e e e e e *1.66e e e **1.23e e e e e-3 * The recovered rank by SVT is 125. ** The recovered rank by SVT is The SVT algorithm can not terminate in one hour. 7.3 Results for real problems In this subsection, we apply HFPA to image inpainting problems in order to test its effectiveness in real data matrices. It is well known that grayscale images and color images can be expressed by matrices and tensors, respectively. In grayscale image inpainting, the grayscale value of some of the pixels of the image are missing, and we want to fill in these missing values. If the image is of low-rank, or of numerical low-rank, we can solve the image inpainting problem as a matrix completion problem (1.2) (see, e.g., [29]). Here, Figure 2(a) is a grayscale image of rank 600. We applied SVD to Figure 2(a) and truncated 22

23 (a) (c) (d) (e) (b) (f) (g) (h) Figure 2: (a):original image with full rank; (b): Image of rank 80 truncated from (a); (c): 50% randomly masked from (a); (d): Recovered image from (c) (rel.err.f = 8.30e-2); (e): 50% randomly masked from (b); (f): Recovered image from (e) (rel.err.f = 6.56e-2); (g): Deterministically masked from (b); (h): Recovered image from (g) (rel.err.f= 6.97e-2). 23

An accelerated proximal gradient algorithm for nuclear norm regularized linear least squares problems

An accelerated proximal gradient algorithm for nuclear norm regularized linear least squares problems An accelerated proximal gradient algorithm for nuclear norm regularized linear least squares problems Kim-Chuan Toh Sangwoon Yun March 27, 2009; Revised, Nov 11, 2009 Abstract The affine rank minimization

More information

Exact Low-rank Matrix Recovery via Nonconvex M p -Minimization

Exact Low-rank Matrix Recovery via Nonconvex M p -Minimization Exact Low-rank Matrix Recovery via Nonconvex M p -Minimization Lingchen Kong and Naihua Xiu Department of Applied Mathematics, Beijing Jiaotong University, Beijing, 100044, People s Republic of China E-mail:

More information

Fixed point and Bregman iterative methods for matrix rank minimization

Fixed point and Bregman iterative methods for matrix rank minimization Math. Program., Ser. A manuscript No. (will be inserted by the editor) Shiqian Ma Donald Goldfarb Lifeng Chen Fixed point and Bregman iterative methods for matrix rank minimization October 27, 2008 Abstract

More information

TRANSFORMED SCHATTEN-1 ITERATIVE THRESHOLDING ALGORITHMS FOR LOW RANK MATRIX COMPLETION

TRANSFORMED SCHATTEN-1 ITERATIVE THRESHOLDING ALGORITHMS FOR LOW RANK MATRIX COMPLETION TRANSFORMED SCHATTEN- ITERATIVE THRESHOLDING ALGORITHMS FOR LOW RANK MATRIX COMPLETION SHUAI ZHANG, PENGHANG YIN, AND JACK XIN Abstract. We study a non-convex low-rank promoting penalty function, the transformed

More information

A UNIFIED APPROACH FOR MINIMIZING COMPOSITE NORMS

A UNIFIED APPROACH FOR MINIMIZING COMPOSITE NORMS A UNIFIED APPROACH FOR MINIMIZING COMPOSITE NORMS N. S. AYBAT AND G. IYENGAR Abstract. We propose a first-order augmented Lagrangian algorithm FALC to solve the composite norm imization problem X R m n

More information

A Singular Value Thresholding Algorithm for Matrix Completion

A Singular Value Thresholding Algorithm for Matrix Completion A Singular Value Thresholding Algorithm for Matrix Completion Jian-Feng Cai Emmanuel J. Candès Zuowei Shen Temasek Laboratories, National University of Singapore, Singapore 117543 Applied and Computational

More information

ACCELERATED LINEARIZED BREGMAN METHOD. June 21, Introduction. In this paper, we are interested in the following optimization problem.

ACCELERATED LINEARIZED BREGMAN METHOD. June 21, Introduction. In this paper, we are interested in the following optimization problem. ACCELERATED LINEARIZED BREGMAN METHOD BO HUANG, SHIQIAN MA, AND DONALD GOLDFARB June 21, 2011 Abstract. In this paper, we propose and analyze an accelerated linearized Bregman (A) method for solving the

More information

Low-Rank Factorization Models for Matrix Completion and Matrix Separation

Low-Rank Factorization Models for Matrix Completion and Matrix Separation for Matrix Completion and Matrix Separation Joint work with Wotao Yin, Yin Zhang and Shen Yuan IPAM, UCLA Oct. 5, 2010 Low rank minimization problems Matrix completion: find a low-rank matrix W R m n so

More information

A Bregman alternating direction method of multipliers for sparse probabilistic Boolean network problem

A Bregman alternating direction method of multipliers for sparse probabilistic Boolean network problem A Bregman alternating direction method of multipliers for sparse probabilistic Boolean network problem Kangkang Deng, Zheng Peng Abstract: The main task of genetic regulatory networks is to construct a

More information

Linearized Alternating Direction Method with Adaptive Penalty for Low-Rank Representation

Linearized Alternating Direction Method with Adaptive Penalty for Low-Rank Representation Linearized Alternating Direction Method with Adaptive Penalty for Low-Rank Representation Zhouchen Lin Visual Computing Group Microsoft Research Asia Risheng Liu Zhixun Su School of Mathematical Sciences

More information

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis Lecture 7: Matrix completion Yuejie Chi The Ohio State University Page 1 Reference Guaranteed Minimum-Rank Solutions of Linear

More information

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16 XVI - 1 Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16 A slightly changed ADMM for convex optimization with three separable operators Bingsheng He Department of

More information

Low-rank Matrix Completion with Noisy Observations: a Quantitative Comparison

Low-rank Matrix Completion with Noisy Observations: a Quantitative Comparison Low-rank Matrix Completion with Noisy Observations: a Quantitative Comparison Raghunandan H. Keshavan, Andrea Montanari and Sewoong Oh Electrical Engineering and Statistics Department Stanford University,

More information

Exact penalty decomposition method for zero-norm minimization based on MPEC formulation 1

Exact penalty decomposition method for zero-norm minimization based on MPEC formulation 1 Exact penalty decomposition method for zero-norm minimization based on MPEC formulation Shujun Bi, Xiaolan Liu and Shaohua Pan November, 2 (First revised July 5, 22) (Second revised March 2, 23) (Final

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 9 Alternating Direction Method of Multipliers Shiqian Ma, MAT-258A: Numerical Optimization 2 Separable convex optimization a special case is min f(x)

More information

Lecture 9: September 28

Lecture 9: September 28 0-725/36-725: Convex Optimization Fall 206 Lecturer: Ryan Tibshirani Lecture 9: September 28 Scribes: Yiming Wu, Ye Yuan, Zhihao Li Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These

More information

Optimization for Compressed Sensing

Optimization for Compressed Sensing Optimization for Compressed Sensing Robert J. Vanderbei 2014 March 21 Dept. of Industrial & Systems Engineering University of Florida http://www.princeton.edu/ rvdb Lasso Regression The problem is to solve

More information

LINEARIZED AUGMENTED LAGRANGIAN AND ALTERNATING DIRECTION METHODS FOR NUCLEAR NORM MINIMIZATION

LINEARIZED AUGMENTED LAGRANGIAN AND ALTERNATING DIRECTION METHODS FOR NUCLEAR NORM MINIMIZATION LINEARIZED AUGMENTED LAGRANGIAN AND ALTERNATING DIRECTION METHODS FOR NUCLEAR NORM MINIMIZATION JUNFENG YANG AND XIAOMING YUAN Abstract. The nuclear norm is widely used to induce low-rank solutions for

More information

CSC 576: Variants of Sparse Learning

CSC 576: Variants of Sparse Learning CSC 576: Variants of Sparse Learning Ji Liu Department of Computer Science, University of Rochester October 27, 205 Introduction Our previous note basically suggests using l norm to enforce sparsity in

More information

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725 Proximal Gradient Descent and Acceleration Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: subgradient method Consider the problem min f(x) with f convex, and dom(f) = R n. Subgradient method:

More information

Coordinate Update Algorithm Short Course Proximal Operators and Algorithms

Coordinate Update Algorithm Short Course Proximal Operators and Algorithms Coordinate Update Algorithm Short Course Proximal Operators and Algorithms Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 36 Why proximal? Newton s method: for C 2 -smooth, unconstrained problems allow

More information

Solving A Low-Rank Factorization Model for Matrix Completion by A Nonlinear Successive Over-Relaxation Algorithm

Solving A Low-Rank Factorization Model for Matrix Completion by A Nonlinear Successive Over-Relaxation Algorithm Solving A Low-Rank Factorization Model for Matrix Completion by A Nonlinear Successive Over-Relaxation Algorithm Zaiwen Wen, Wotao Yin, Yin Zhang 2010 ONR Compressed Sensing Workshop May, 2010 Matrix Completion

More information

SPARSE signal representations have gained popularity in recent

SPARSE signal representations have gained popularity in recent 6958 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 10, OCTOBER 2011 Blind Compressed Sensing Sivan Gleichman and Yonina C. Eldar, Senior Member, IEEE Abstract The fundamental principle underlying

More information

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2017 LECTURE 5

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2017 LECTURE 5 STAT 39: MATHEMATICAL COMPUTATIONS I FALL 17 LECTURE 5 1 existence of svd Theorem 1 (Existence of SVD) Every matrix has a singular value decomposition (condensed version) Proof Let A C m n and for simplicity

More information

A Scalable Spectral Relaxation Approach to Matrix Completion via Kronecker Products

A Scalable Spectral Relaxation Approach to Matrix Completion via Kronecker Products A Scalable Spectral Relaxation Approach to Matrix Completion via Kronecker Products Hui Zhao Jiuqiang Han Naiyan Wang Congfu Xu Zhihua Zhang Department of Automation, Xi an Jiaotong University, Xi an,

More information

Iterative Reweighted Minimization Methods for l p Regularized Unconstrained Nonlinear Programming

Iterative Reweighted Minimization Methods for l p Regularized Unconstrained Nonlinear Programming Iterative Reweighted Minimization Methods for l p Regularized Unconstrained Nonlinear Programming Zhaosong Lu October 5, 2012 (Revised: June 3, 2013; September 17, 2013) Abstract In this paper we study

More information

Algorithms for Nonsmooth Optimization

Algorithms for Nonsmooth Optimization Algorithms for Nonsmooth Optimization Frank E. Curtis, Lehigh University presented at Center for Optimization and Statistical Learning, Northwestern University 2 March 2018 Algorithms for Nonsmooth Optimization

More information

Robust PCA. CS5240 Theoretical Foundations in Multimedia. Leow Wee Kheng

Robust PCA. CS5240 Theoretical Foundations in Multimedia. Leow Wee Kheng Robust PCA CS5240 Theoretical Foundations in Multimedia Leow Wee Kheng Department of Computer Science School of Computing National University of Singapore Leow Wee Kheng (NUS) Robust PCA 1 / 52 Previously...

More information

Information-Theoretic Limits of Matrix Completion

Information-Theoretic Limits of Matrix Completion Information-Theoretic Limits of Matrix Completion Erwin Riegler, David Stotz, and Helmut Bölcskei Dept. IT & EE, ETH Zurich, Switzerland Email: {eriegler, dstotz, boelcskei}@nari.ee.ethz.ch Abstract We

More information

A Randomized Algorithm for the Approximation of Matrices

A Randomized Algorithm for the Approximation of Matrices A Randomized Algorithm for the Approximation of Matrices Per-Gunnar Martinsson, Vladimir Rokhlin, and Mark Tygert Technical Report YALEU/DCS/TR-36 June 29, 2006 Abstract Given an m n matrix A and a positive

More information

Conditions for Robust Principal Component Analysis

Conditions for Robust Principal Component Analysis Rose-Hulman Undergraduate Mathematics Journal Volume 12 Issue 2 Article 9 Conditions for Robust Principal Component Analysis Michael Hornstein Stanford University, mdhornstein@gmail.com Follow this and

More information

Block Coordinate Descent for Regularized Multi-convex Optimization

Block Coordinate Descent for Regularized Multi-convex Optimization Block Coordinate Descent for Regularized Multi-convex Optimization Yangyang Xu and Wotao Yin CAAM Department, Rice University February 15, 2013 Multi-convex optimization Model definition Applications Outline

More information

Minimizing the Difference of L 1 and L 2 Norms with Applications

Minimizing the Difference of L 1 and L 2 Norms with Applications 1/36 Minimizing the Difference of L 1 and L 2 Norms with Department of Mathematical Sciences University of Texas Dallas May 31, 2017 Partially supported by NSF DMS 1522786 2/36 Outline 1 A nonconvex approach:

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Low-rank matrix recovery via convex relaxations Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

Augmented Lagrangian alternating direction method for matrix separation based on low-rank factorization

Augmented Lagrangian alternating direction method for matrix separation based on low-rank factorization Augmented Lagrangian alternating direction method for matrix separation based on low-rank factorization Yuan Shen Zaiwen Wen Yin Zhang January 11, 2011 Abstract The matrix separation problem aims to separate

More information

Sparse Approximation via Penalty Decomposition Methods

Sparse Approximation via Penalty Decomposition Methods Sparse Approximation via Penalty Decomposition Methods Zhaosong Lu Yong Zhang February 19, 2012 Abstract In this paper we consider sparse approximation problems, that is, general l 0 minimization problems

More information

Least Sparsity of p-norm based Optimization Problems with p > 1

Least Sparsity of p-norm based Optimization Problems with p > 1 Least Sparsity of p-norm based Optimization Problems with p > Jinglai Shen and Seyedahmad Mousavi Original version: July, 07; Revision: February, 08 Abstract Motivated by l p -optimization arising from

More information

Matrix Completion: Fundamental Limits and Efficient Algorithms

Matrix Completion: Fundamental Limits and Efficient Algorithms Matrix Completion: Fundamental Limits and Efficient Algorithms Sewoong Oh PhD Defense Stanford University July 23, 2010 1 / 33 Matrix completion Find the missing entries in a huge data matrix 2 / 33 Example

More information

Homotopy methods based on l 0 norm for the compressed sensing problem

Homotopy methods based on l 0 norm for the compressed sensing problem Homotopy methods based on l 0 norm for the compressed sensing problem Wenxing Zhu, Zhengshan Dong Center for Discrete Mathematics and Theoretical Computer Science, Fuzhou University, Fuzhou 350108, China

More information

ON THE GLOBAL AND LINEAR CONVERGENCE OF THE GENERALIZED ALTERNATING DIRECTION METHOD OF MULTIPLIERS

ON THE GLOBAL AND LINEAR CONVERGENCE OF THE GENERALIZED ALTERNATING DIRECTION METHOD OF MULTIPLIERS ON THE GLOBAL AND LINEAR CONVERGENCE OF THE GENERALIZED ALTERNATING DIRECTION METHOD OF MULTIPLIERS WEI DENG AND WOTAO YIN Abstract. The formulation min x,y f(x) + g(y) subject to Ax + By = b arises in

More information

Matrix completion: Fundamental limits and efficient algorithms. Sewoong Oh Stanford University

Matrix completion: Fundamental limits and efficient algorithms. Sewoong Oh Stanford University Matrix completion: Fundamental limits and efficient algorithms Sewoong Oh Stanford University 1 / 35 Low-rank matrix completion Low-rank Data Matrix Sparse Sampled Matrix Complete the matrix from small

More information

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 11, NOVEMBER On the Performance of Sparse Recovery

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 11, NOVEMBER On the Performance of Sparse Recovery IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 11, NOVEMBER 2011 7255 On the Performance of Sparse Recovery Via `p-minimization (0 p 1) Meng Wang, Student Member, IEEE, Weiyu Xu, and Ao Tang, Senior

More information

Research Note. A New Infeasible Interior-Point Algorithm with Full Nesterov-Todd Step for Semi-Definite Optimization

Research Note. A New Infeasible Interior-Point Algorithm with Full Nesterov-Todd Step for Semi-Definite Optimization Iranian Journal of Operations Research Vol. 4, No. 1, 2013, pp. 88-107 Research Note A New Infeasible Interior-Point Algorithm with Full Nesterov-Todd Step for Semi-Definite Optimization B. Kheirfam We

More information

Big Data Analytics: Optimization and Randomization

Big Data Analytics: Optimization and Randomization Big Data Analytics: Optimization and Randomization Tianbao Yang Tutorial@ACML 2015 Hong Kong Department of Computer Science, The University of Iowa, IA, USA Nov. 20, 2015 Yang Tutorial for ACML 15 Nov.

More information

Lecture 5 : Projections

Lecture 5 : Projections Lecture 5 : Projections EE227C. Lecturer: Professor Martin Wainwright. Scribe: Alvin Wan Up until now, we have seen convergence rates of unconstrained gradient descent. Now, we consider a constrained minimization

More information

Compressed Sensing and Robust Recovery of Low Rank Matrices

Compressed Sensing and Robust Recovery of Low Rank Matrices Compressed Sensing and Robust Recovery of Low Rank Matrices M. Fazel, E. Candès, B. Recht, P. Parrilo Electrical Engineering, University of Washington Applied and Computational Mathematics Dept., Caltech

More information

A Customized ADMM for Rank-Constrained Optimization Problems with Approximate Formulations

A Customized ADMM for Rank-Constrained Optimization Problems with Approximate Formulations A Customized ADMM for Rank-Constrained Optimization Problems with Approximate Formulations Chuangchuang Sun and Ran Dai Abstract This paper proposes a customized Alternating Direction Method of Multipliers

More information

Robust Principal Component Analysis

Robust Principal Component Analysis ELE 538B: Mathematics of High-Dimensional Data Robust Principal Component Analysis Yuxin Chen Princeton University, Fall 2018 Disentangling sparse and low-rank matrices Suppose we are given a matrix M

More information

Linear Systems. Carlo Tomasi

Linear Systems. Carlo Tomasi Linear Systems Carlo Tomasi Section 1 characterizes the existence and multiplicity of the solutions of a linear system in terms of the four fundamental spaces associated with the system s matrix and of

More information

Numerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization /36-725

Numerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization /36-725 Numerical Linear Algebra Primer Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: proximal gradient descent Consider the problem min g(x) + h(x) with g, h convex, g differentiable, and h simple

More information

Lecture: Matrix Completion

Lecture: Matrix Completion 1/56 Lecture: Matrix Completion http://bicmr.pku.edu.cn/~wenzw/bigdata2017.html Acknowledgement: this slides is based on Prof. Jure Leskovec and Prof. Emmanuel Candes s lecture notes Recommendation systems

More information

Iterative Methods for Solving A x = b

Iterative Methods for Solving A x = b Iterative Methods for Solving A x = b A good (free) online source for iterative methods for solving A x = b is given in the description of a set of iterative solvers called templates found at netlib: http

More information

Robust Principal Component Analysis Based on Low-Rank and Block-Sparse Matrix Decomposition

Robust Principal Component Analysis Based on Low-Rank and Block-Sparse Matrix Decomposition Robust Principal Component Analysis Based on Low-Rank and Block-Sparse Matrix Decomposition Gongguo Tang and Arye Nehorai Department of Electrical and Systems Engineering Washington University in St Louis

More information

SOLVING A LOW-RANK FACTORIZATION MODEL FOR MATRIX COMPLETION BY A NONLINEAR SUCCESSIVE OVER-RELAXATION ALGORITHM

SOLVING A LOW-RANK FACTORIZATION MODEL FOR MATRIX COMPLETION BY A NONLINEAR SUCCESSIVE OVER-RELAXATION ALGORITHM SOLVING A LOW-RANK FACTORIZATION MODEL FOR MATRIX COMPLETION BY A NONLINEAR SUCCESSIVE OVER-RELAXATION ALGORITHM ZAIWEN WEN, WOTAO YIN, AND YIN ZHANG CAAM TECHNICAL REPORT TR10-07 DEPARTMENT OF COMPUTATIONAL

More information

Nonlinear Optimization for Optimal Control

Nonlinear Optimization for Optimal Control Nonlinear Optimization for Optimal Control Pieter Abbeel UC Berkeley EECS Many slides and figures adapted from Stephen Boyd [optional] Boyd and Vandenberghe, Convex Optimization, Chapters 9 11 [optional]

More information

A Fast Augmented Lagrangian Algorithm for Learning Low-Rank Matrices

A Fast Augmented Lagrangian Algorithm for Learning Low-Rank Matrices A Fast Augmented Lagrangian Algorithm for Learning Low-Rank Matrices Ryota Tomioka 1, Taiji Suzuki 1, Masashi Sugiyama 2, Hisashi Kashima 1 1 The University of Tokyo 2 Tokyo Institute of Technology 2010-06-22

More information

Compressed sensing. Or: the equation Ax = b, revisited. Terence Tao. Mahler Lecture Series. University of California, Los Angeles

Compressed sensing. Or: the equation Ax = b, revisited. Terence Tao. Mahler Lecture Series. University of California, Los Angeles Or: the equation Ax = b, revisited University of California, Los Angeles Mahler Lecture Series Acquiring signals Many types of real-world signals (e.g. sound, images, video) can be viewed as an n-dimensional

More information

Reconstruction from Anisotropic Random Measurements

Reconstruction from Anisotropic Random Measurements Reconstruction from Anisotropic Random Measurements Mark Rudelson and Shuheng Zhou The University of Michigan, Ann Arbor Coding, Complexity, and Sparsity Workshop, 013 Ann Arbor, Michigan August 7, 013

More information

Lecture 8: February 9

Lecture 8: February 9 0-725/36-725: Convex Optimiation Spring 205 Lecturer: Ryan Tibshirani Lecture 8: February 9 Scribes: Kartikeya Bhardwaj, Sangwon Hyun, Irina Caan 8 Proximal Gradient Descent In the previous lecture, we

More information

Linear Systems. Carlo Tomasi. June 12, r = rank(a) b range(a) n r solutions

Linear Systems. Carlo Tomasi. June 12, r = rank(a) b range(a) n r solutions Linear Systems Carlo Tomasi June, 08 Section characterizes the existence and multiplicity of the solutions of a linear system in terms of the four fundamental spaces associated with the system s matrix

More information

Optimization Theory. A Concise Introduction. Jiongmin Yong

Optimization Theory. A Concise Introduction. Jiongmin Yong October 11, 017 16:5 ws-book9x6 Book Title Optimization Theory 017-08-Lecture Notes page 1 1 Optimization Theory A Concise Introduction Jiongmin Yong Optimization Theory 017-08-Lecture Notes page Optimization

More information

RECOVERING LOW-RANK AND SPARSE COMPONENTS OF MATRICES FROM INCOMPLETE AND NOISY OBSERVATIONS. December

RECOVERING LOW-RANK AND SPARSE COMPONENTS OF MATRICES FROM INCOMPLETE AND NOISY OBSERVATIONS. December RECOVERING LOW-RANK AND SPARSE COMPONENTS OF MATRICES FROM INCOMPLETE AND NOISY OBSERVATIONS MIN TAO AND XIAOMING YUAN December 31 2009 Abstract. Many applications arising in a variety of fields can be

More information

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization / Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear

More information

Projection methods to solve SDP

Projection methods to solve SDP Projection methods to solve SDP Franz Rendl http://www.math.uni-klu.ac.at Alpen-Adria-Universität Klagenfurt Austria F. Rendl, Oberwolfach Seminar, May 2010 p.1/32 Overview Augmented Primal-Dual Method

More information

Principal Component Analysis

Principal Component Analysis Machine Learning Michaelmas 2017 James Worrell Principal Component Analysis 1 Introduction 1.1 Goals of PCA Principal components analysis (PCA) is a dimensionality reduction technique that can be used

More information

Accelerated primal-dual methods for linearly constrained convex problems

Accelerated primal-dual methods for linearly constrained convex problems Accelerated primal-dual methods for linearly constrained convex problems Yangyang Xu SIAM Conference on Optimization May 24, 2017 1 / 23 Accelerated proximal gradient For convex composite problem: minimize

More information

Sparse signals recovered by non-convex penalty in quasi-linear systems

Sparse signals recovered by non-convex penalty in quasi-linear systems Cui et al. Journal of Inequalities and Applications 018) 018:59 https://doi.org/10.1186/s13660-018-165-8 R E S E A R C H Open Access Sparse signals recovered by non-conve penalty in quasi-linear systems

More information

Lecture Note 7: Iterative methods for solving linear systems. Xiaoqun Zhang Shanghai Jiao Tong University

Lecture Note 7: Iterative methods for solving linear systems. Xiaoqun Zhang Shanghai Jiao Tong University Lecture Note 7: Iterative methods for solving linear systems Xiaoqun Zhang Shanghai Jiao Tong University Last updated: December 24, 2014 1.1 Review on linear algebra Norms of vectors and matrices vector

More information

Sparse Optimization Lecture: Basic Sparse Optimization Models

Sparse Optimization Lecture: Basic Sparse Optimization Models Sparse Optimization Lecture: Basic Sparse Optimization Models Instructor: Wotao Yin July 2013 online discussions on piazza.com Those who complete this lecture will know basic l 1, l 2,1, and nuclear-norm

More information

A Randomized Nonmonotone Block Proximal Gradient Method for a Class of Structured Nonlinear Programming

A Randomized Nonmonotone Block Proximal Gradient Method for a Class of Structured Nonlinear Programming A Randomized Nonmonotone Block Proximal Gradient Method for a Class of Structured Nonlinear Programming Zhaosong Lu Lin Xiao March 9, 2015 (Revised: May 13, 2016; December 30, 2016) Abstract We propose

More information

Compressed Sensing via Partial l 1 Minimization

Compressed Sensing via Partial l 1 Minimization WORCESTER POLYTECHNIC INSTITUTE Compressed Sensing via Partial l 1 Minimization by Lu Zhong A thesis Submitted to the Faculty of the WORCESTER POLYTECHNIC INSTITUTE in partial fulfillment of the requirements

More information

A GENERAL INERTIAL PROXIMAL POINT METHOD FOR MIXED VARIATIONAL INEQUALITY PROBLEM

A GENERAL INERTIAL PROXIMAL POINT METHOD FOR MIXED VARIATIONAL INEQUALITY PROBLEM A GENERAL INERTIAL PROXIMAL POINT METHOD FOR MIXED VARIATIONAL INEQUALITY PROBLEM CAIHUA CHEN, SHIQIAN MA, AND JUNFENG YANG Abstract. In this paper, we first propose a general inertial proximal point method

More information

We describe the generalization of Hazan s algorithm for symmetric programming

We describe the generalization of Hazan s algorithm for symmetric programming ON HAZAN S ALGORITHM FOR SYMMETRIC PROGRAMMING PROBLEMS L. FAYBUSOVICH Abstract. problems We describe the generalization of Hazan s algorithm for symmetric programming Key words. Symmetric programming,

More information

A direct formulation for sparse PCA using semidefinite programming

A direct formulation for sparse PCA using semidefinite programming A direct formulation for sparse PCA using semidefinite programming A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley A. d Aspremont, INFORMS, Denver,

More information

CS598 Machine Learning in Computational Biology (Lecture 5: Matrix - part 2) Professor Jian Peng Teaching Assistant: Rongda Zhu

CS598 Machine Learning in Computational Biology (Lecture 5: Matrix - part 2) Professor Jian Peng Teaching Assistant: Rongda Zhu CS598 Machine Learning in Computational Biology (Lecture 5: Matrix - part 2) Professor Jian Peng Teaching Assistant: Rongda Zhu Feature engineering is hard 1. Extract informative features from domain knowledge

More information

Introduction to Compressed Sensing

Introduction to Compressed Sensing Introduction to Compressed Sensing Alejandro Parada, Gonzalo Arce University of Delaware August 25, 2016 Motivation: Classical Sampling 1 Motivation: Classical Sampling Issues Some applications Radar Spectral

More information

Accelerated Block-Coordinate Relaxation for Regularized Optimization

Accelerated Block-Coordinate Relaxation for Regularized Optimization Accelerated Block-Coordinate Relaxation for Regularized Optimization Stephen J. Wright Computer Sciences University of Wisconsin, Madison October 09, 2012 Problem descriptions Consider where f is smooth

More information

Kaisa Joki Adil M. Bagirov Napsu Karmitsa Marko M. Mäkelä. New Proximal Bundle Method for Nonsmooth DC Optimization

Kaisa Joki Adil M. Bagirov Napsu Karmitsa Marko M. Mäkelä. New Proximal Bundle Method for Nonsmooth DC Optimization Kaisa Joki Adil M. Bagirov Napsu Karmitsa Marko M. Mäkelä New Proximal Bundle Method for Nonsmooth DC Optimization TUCS Technical Report No 1130, February 2015 New Proximal Bundle Method for Nonsmooth

More information

Acyclic Semidefinite Approximations of Quadratically Constrained Quadratic Programs

Acyclic Semidefinite Approximations of Quadratically Constrained Quadratic Programs Acyclic Semidefinite Approximations of Quadratically Constrained Quadratic Programs Raphael Louca & Eilyan Bitar School of Electrical and Computer Engineering American Control Conference (ACC) Chicago,

More information

Bindel, Fall 2009 Matrix Computations (CS 6210) Week 8: Friday, Oct 17

Bindel, Fall 2009 Matrix Computations (CS 6210) Week 8: Friday, Oct 17 Logistics Week 8: Friday, Oct 17 1. HW 3 errata: in Problem 1, I meant to say p i < i, not that p i is strictly ascending my apologies. You would want p i > i if you were simply forming the matrices and

More information

Recovering any low-rank matrix, provably

Recovering any low-rank matrix, provably Recovering any low-rank matrix, provably Rachel Ward University of Texas at Austin October, 2014 Joint work with Yudong Chen (U.C. Berkeley), Srinadh Bhojanapalli and Sujay Sanghavi (U.T. Austin) Matrix

More information

arxiv: v1 [math.oc] 23 May 2017

arxiv: v1 [math.oc] 23 May 2017 A DERANDOMIZED ALGORITHM FOR RP-ADMM WITH SYMMETRIC GAUSS-SEIDEL METHOD JINCHAO XU, KAILAI XU, AND YINYU YE arxiv:1705.08389v1 [math.oc] 23 May 2017 Abstract. For multi-block alternating direction method

More information

Homework 5. Convex Optimization /36-725

Homework 5. Convex Optimization /36-725 Homework 5 Convex Optimization 10-725/36-725 Due Tuesday November 22 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)

More information

Key Words: low-rank matrix recovery, iteratively reweighted least squares, matrix completion.

Key Words: low-rank matrix recovery, iteratively reweighted least squares, matrix completion. LOW-RANK MATRIX RECOVERY VIA ITERATIVELY REWEIGHTED LEAST SQUARES MINIMIZATION MASSIMO FORNASIER, HOLGER RAUHUT, AND RACHEL WARD Abstract. We present and analyze an efficient implementation of an iteratively

More information

Convex Optimization on Large-Scale Domains Given by Linear Minimization Oracles

Convex Optimization on Large-Scale Domains Given by Linear Minimization Oracles Convex Optimization on Large-Scale Domains Given by Linear Minimization Oracles Arkadi Nemirovski H. Milton Stewart School of Industrial and Systems Engineering Georgia Institute of Technology Joint research

More information

Numerical Methods I Non-Square and Sparse Linear Systems

Numerical Methods I Non-Square and Sparse Linear Systems Numerical Methods I Non-Square and Sparse Linear Systems Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 MATH-GA 2011.003 / CSCI-GA 2945.003, Fall 2014 September 25th, 2014 A. Donev (Courant

More information

Optimization methods

Optimization methods Optimization methods Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda /8/016 Introduction Aim: Overview of optimization methods that Tend to

More information

A Cross-Associative Neural Network for SVD of Nonsquared Data Matrix in Signal Processing

A Cross-Associative Neural Network for SVD of Nonsquared Data Matrix in Signal Processing IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 12, NO. 5, SEPTEMBER 2001 1215 A Cross-Associative Neural Network for SVD of Nonsquared Data Matrix in Signal Processing Da-Zheng Feng, Zheng Bao, Xian-Da Zhang

More information

THE SINGULAR VALUE DECOMPOSITION MARKUS GRASMAIR

THE SINGULAR VALUE DECOMPOSITION MARKUS GRASMAIR THE SINGULAR VALUE DECOMPOSITION MARKUS GRASMAIR 1. Definition Existence Theorem 1. Assume that A R m n. Then there exist orthogonal matrices U R m m V R n n, values σ 1 σ 2... σ p 0 with p = min{m, n},

More information

Elaine T. Hale, Wotao Yin, Yin Zhang

Elaine T. Hale, Wotao Yin, Yin Zhang , Wotao Yin, Yin Zhang Department of Computational and Applied Mathematics Rice University McMaster University, ICCOPT II-MOPTA 2007 August 13, 2007 1 with Noise 2 3 4 1 with Noise 2 3 4 1 with Noise 2

More information

A Brief Overview of Practical Optimization Algorithms in the Context of Relaxation

A Brief Overview of Practical Optimization Algorithms in the Context of Relaxation A Brief Overview of Practical Optimization Algorithms in the Context of Relaxation Zhouchen Lin Peking University April 22, 2018 Too Many Opt. Problems! Too Many Opt. Algorithms! Zero-th order algorithms:

More information

15-780: LinearProgramming

15-780: LinearProgramming 15-780: LinearProgramming J. Zico Kolter February 1-3, 2016 1 Outline Introduction Some linear algebra review Linear programming Simplex algorithm Duality and dual simplex 2 Outline Introduction Some linear

More information

Recent Developments in Compressed Sensing

Recent Developments in Compressed Sensing Recent Developments in Compressed Sensing M. Vidyasagar Distinguished Professor, IIT Hyderabad m.vidyasagar@iith.ac.in, www.iith.ac.in/ m vidyasagar/ ISL Seminar, Stanford University, 19 April 2018 Outline

More information

HYBRID JACOBIAN AND GAUSS SEIDEL PROXIMAL BLOCK COORDINATE UPDATE METHODS FOR LINEARLY CONSTRAINED CONVEX PROGRAMMING

HYBRID JACOBIAN AND GAUSS SEIDEL PROXIMAL BLOCK COORDINATE UPDATE METHODS FOR LINEARLY CONSTRAINED CONVEX PROGRAMMING SIAM J. OPTIM. Vol. 8, No. 1, pp. 646 670 c 018 Society for Industrial and Applied Mathematics HYBRID JACOBIAN AND GAUSS SEIDEL PROXIMAL BLOCK COORDINATE UPDATE METHODS FOR LINEARLY CONSTRAINED CONVEX

More information

ACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING

ACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING ACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING YANGYANG XU Abstract. Motivated by big data applications, first-order methods have been extremely

More information

A Smoothing Newton Method for Solving Absolute Value Equations

A Smoothing Newton Method for Solving Absolute Value Equations A Smoothing Newton Method for Solving Absolute Value Equations Xiaoqin Jiang Department of public basic, Wuhan Yangtze Business University, Wuhan 430065, P.R. China 392875220@qq.com Abstract: In this paper,

More information

Sparse Optimization Lecture: Dual Methods, Part I

Sparse Optimization Lecture: Dual Methods, Part I Sparse Optimization Lecture: Dual Methods, Part I Instructor: Wotao Yin July 2013 online discussions on piazza.com Those who complete this lecture will know dual (sub)gradient iteration augmented l 1 iteration

More information

Spectral gradient projection method for solving nonlinear monotone equations

Spectral gradient projection method for solving nonlinear monotone equations Journal of Computational and Applied Mathematics 196 (2006) 478 484 www.elsevier.com/locate/cam Spectral gradient projection method for solving nonlinear monotone equations Li Zhang, Weijun Zhou Department

More information

Rapid, Robust, and Reliable Blind Deconvolution via Nonconvex Optimization

Rapid, Robust, and Reliable Blind Deconvolution via Nonconvex Optimization Rapid, Robust, and Reliable Blind Deconvolution via Nonconvex Optimization Shuyang Ling Department of Mathematics, UC Davis Oct.18th, 2016 Shuyang Ling (UC Davis) 16w5136, Oaxaca, Mexico Oct.18th, 2016

More information