Homotopy methods based on l 0 norm for the compressed sensing problem

Size: px
Start display at page:

Download "Homotopy methods based on l 0 norm for the compressed sensing problem"

Transcription

1 Homotopy methods based on l 0 norm for the compressed sensing problem Wenxing Zhu, Zhengshan Dong Center for Discrete Mathematics and Theoretical Computer Science, Fuzhou University, Fuzhou , China Abstract In this paper, two homotopy methods, which combine the advantage of the homotopy technique with the effectiveness of the iterative hard thresholding method, are presented for solving the compressed sensing problem. Under some mild assumptions, we prove that the limits of the sequences generated by the proposed homotopy methods are feasible solutions of the problem, and under some conditions they are local minimizers of the problem. The proposed methods overcome the difficulty of the iterative hard thresholding method on the choice of the regularization parameter by tracing solutions of the sparse problem along a homotopy path. Moreover, to improve the solution quality of the two methods, we modify them and give two empirical algorithms. Numerical experiments demonstrate the effectiveness of the two proposed algorithms in accurately and efficiently generating sparse solutions of the compressed sensing problem. Key words: Compressed sensing; sparse optimization; homotopy method; iterative hard thresholding method; proximal gradient method. 1 Introduction In this paper, we present two homotopy methods based on the l 0 norm to approximately solve the following compressed sensing (CS) problem, min x x 0 s.t. Ax = y, (1) where x R n is the vector of unknowns, A R m n (m n, assume that A is full row rank), and y R m are the problem data. x 0 is the number of nonzero components of x. Over the last decade, with the principle that using the simplest representation to explain the given problem or phenomena, the sparse model has been pursued in signal 1

2 processing and many other fields, such as denoising, linear regression, inverse problems, model selection, machine learning, image processing and so on. Despite that finding an optimal solution of problem (1) is NP-hard [22], there exist a large number of methods to approximately solve the problem. These methods can be categorized into four classes: (i) Combinatorial methods, e.g., Matching Pursuit (MP) [21], Orthogonal Matching Pursuit (OMP) [11], Least Angle Regression (LARS) [13]; (ii) l 1 -norm regularization methods, e.g., Gradient Projection (GP) [14], Iterative Shrinkage-Thresholding (IST) [10], Accelerated Proximal Gradient (APG) [1, 23], Iterative Reweighted Method [7], Alternating Direction Method (ADM) [27], Homotopy methods [20, 25]; (iii) l p -norm (0 < p < 1) regularization methods [8, 9, 16]; (iv) l 0 -norm regularization methods, e.g., Penalty Decomposition (PD) method [19], Iterative Hard Thresholding (IHT) methods [3, 4, 18, 24, 25], and so on. To overcome the difficulty of solving the sparse problem (1), with the reason that the l 1 norm can promote sparsity [6], the l 1 -norm regularization methods replace the l 0 norm by the l 1 norm, min x 1 s.t. Ax = y. (2) x Regularized by a parameter λ, the above problem (2) can be written as the following popular l 1 -regularized least-square (l 1 -LS) problem, min x 1 2 Ax y 2 + λ x 1, (3) where x 1 = n x i. Problem (3) can be solved by Nesterov s first order methods i=1 [23], i.e., the primal gradient (PG) method, the dual gradient (DG) method, and the accelerated dual gradient (AC) method. The PG and DG methods have convergence rate o( 1 ), where k is the number of iteration steps, and the AC method has a faster k convergence rate o( 1 ). Some similar PG methods can be found in literatures [25, 26]. k 2 Some methods solve the sparse problem (1) basing on the l 0 -norm directly, such as the Iterative Hard Thresholding (IHT) methods [3, 4, 18, 24, 25], the Penalty Decomposition (PD) method [19], and provide much better sparse solutions of the sparsity problem. When regularized directly by the l 0 norm, they solve the following l 0 -regularized leastsquare (l 0 -LS) problem, 1 min x 2 Ax y 2 + λ x 0. (4) Recently, Blumensath and Davies [3, 4] presented two IHT methods to solve the l 0 - LS problem (4) and s-sparse model respectively. Experimental results show that the IHT methods can be used to improve the results generated by other methods. Lu [18] extended the IHT methods to the l 0 regularized convex cone programming problem and showed that the IHT methods can converge to a local minimizer. He also presented the iteration complexity of the IHT methods for finding an ϵ-optimal solution. What s more, the properties of the global/local minimizers of the above problem (4) have been studied in literature [24]. 2

3 The above mentioned methods (no matter for the l 1 -norm regularized problem or for the l 0 -norm regularized problem) use a fixed value of the regularization parameter λ, although suitable values of the parameter can generate better solutions. Unfortunately, there is no rule for selecting a suitable value of the regularization parameter, leaving that finding a good regularization parameter is still a challenge work [2]. However, homotopy methods can overcome this difficulty, which calculate efficiently and trace the solutions of the sparse problem along a continuation path. Recently, Xiao and Zhang [26] proposed a homotopy method (PGH method) for the l 1 -norm regularized problem (3) for finding an ϵ-optimal solution quickly and efficiently, whose overall iteration complexity is o(log( 1 ϵ )). Later, Lin and Xiao [17] proposed an adaptive accelerated technique of this method with complexity analysis. Following this line, two homotopy methods (called HIHT and AHIHT methods), combining the advantage of the homotopy technique with the effectiveness of the IHT method, are presented in this paper for solving the compressed sensing problem (1) directly. Under some mild assumptions, convergence of the proposed methods is proved. Comparing with the convergence analysis of the IHT method in [3], the assumption that A < 1 is not needed. To improve the solution quality of the two methods, we modify them and give two empirical algorithms. Experimental results show that the modified l 0 -norm based homotopy methods (HIHT/AHIHT) are more efficient and effective than the l 1 -norm based homotopy methods (PGH/ADGH) [26]. Moreover, though the HIHT/AHIHT methods cannot deal with too small regularization parameter λ well enough, they can use smaller λ than the PGH method. In fact, the regularization parameter λ of the PGH method cannot be too small [26]. The rest of this paper is organized as follows. Section 2 is the preliminary, in which some notations and the IHT method for the l 0 -norm regularized convex programming are described. Section 3 presents two homotopy methods for the l 0 -norm regularized problem (4), which combine the homotopy technique with the IHT method. Moreover, convergence analyses are given, and we prove that the limits of the sequences generated by the proposed homotopy methods are feasible solutions of problem (1). And under some conditions they are local minimizers of the problem. In section 4, we modify the two methods and give two empirical algorithms. Numerical experiments are put in Section 5 and conclusions are made in Section 6, respectively. 2 Preliminaries 2.1 Notations In this subsection, some notations are presented to simplify presentation. If without special statement, all norms used are the Euclidean norm denoted by. The transpose of a vector x R n is denoted by x H. Given an index set I {1,..., n}, x I denotes the subvector formed by the components of x indexed by I. E denotes the identity matrix. 3

4 The index of nonzero components of a vector x is denoted by S(x) = {i : x i 0} (called support set). Let S(x) be the complement of S(x), i.e., S(x) = {1, 2,, n} S(x) = {i : x i = 0}. Denote the size of S(x) as S(x). Given an index set I {1,, n}, the set of solutions with subvectors formed by I zero is denoted by B I = {x R n : x I = 0}. For the sake of easy statement, problem (4) is rewritten as min x ϕ λ (x) = f(x) + λ x 0, (5) where f(x) = 1 2 Ax y 2 is a differentiable convex function whose gradient is Lipschitz continuous (denote its Lipschitz constant as L f ). 2.2 IHT method In this subsection, the IHT method [18] for l 0 regularized convex programming and some main results are described. To solve problem (5), the main idea of IHT is the use of proximal point technique at each iteration: the function f(x) is approximated by a quadratic function at the current solution x 0 and keep the second term in problem (5) unchanged. This forms the following function: p L,λ (x 0, x) = f(x 0 ) + f(x 0 ), x x 0 + L 2 x x0 2 + λ x 0, (6) where the quadratic term is the proximal term, L > 0 is a constant, which should essentially be an upper bound on the Lipschitz constant of f(x), i.e., L L f. The minimizer of p L,λ (x 0, x) is the same as that of the following problem, min x [ L x (x L f(x0 )) 2 + 2λ ] L x 0. Then the minimizer of p L,λ (x 0, x) can be obtained by the hard thresholding operator [18, 25]. If denote T L (x 0 ) = argmin p L,λ (x 0, x), (7) x then the closed form solution of T L (x 0 ) is given by the following Lemma 2.1. Lemma 2.1. [18, 25] The solution T L (x 0 ) of problem (7) is given by [s L (x 0 )] i, if [s L (x 0 )] 2 [T L (x 0 i > 2λ; L )] i = 0, if [s L (x 0 )] 2 i < 2λ; L 0 or [s L (x 0 )] i, otherwise, where s L (x) = x 1 L f(x), [ ] i denotes the i-th component of the vector. 4

5 Remark: For getting a unique solution, in the following algorithms, we set [T L (x 0 )] i = 0 when [s L (x 0 )] 2 i = 2λ L. The core of the basic IHT method is repeatedly solving the subproblem (7) until some termination condition reaches. But for the basic IHT method, a fixed L is used throughout all iterations, which may miss some local information of f(x) at the current step. What s more, the upper bound on L f is unknown or may not be easily calculated. To improve its practical performance, a suitable value of L is obtained dynamically by iteratively increasing the value. The frame viht which is a variant of the IHT method is presented as follows. Algorithm 1: [18] {x, L } viht (L 0, λ, x 0 ) Input: L 0, x 0, λ, L min, L max ; //L 0 [L min, L max ] Output: {x, L }; 1: initialization k 0, γ > 1, η > 0; 2: repeat 3: x k+1 T Lk (x k ); 4: while ϕ λ (x k ) ϕ λ (x k+1 ) < η 2 xk x k+1 2 do 5: L k min{γl k, L max }; 6: x k+1 T Lk (x k ); 7: end while 8: L k+1 L k ; 9: k k + 1; 10: until some termination condition reaches 11: x x k ; 12: L L k. Remark: (1) Lu [18] presented a strategy for the initial value of L 0 in Algorithm 1, L 0 = min{max{ f H x x 2, L min}, L max }, (8) where x = x k x k 1, f = f(x k ) f(x k 1), and [L min, L max ] is the interval with L value in. (2) For each outer loop, the number of iterations between lines 4-7 of Algorithm 1 is finite. In other words, the line search can be terminated in a finite number of steps. In fact, for the outer iteration of Algorithm 1, if L k > L f, then one can show that ([18]) ϕ λ (x k ) ϕ λ (x k+1 ) L k L f 2 x k+1 x k 2, (9) which implies that the inner line search steps stop if L k L f + η. Thus, L k /γ L f + η holds, that is, L k γ(l f + η). Let ˆn k be the number of inner loops at the k-th iteration of outer loop. Then one can get L min γ ˆn k 1 L 0 γ ˆn k 1 L k γ(l f + η). 5

6 Therefore, ˆn k log(l f +η) log(l min ) log(γ) + 2. Assumption. In the sequel, we always assume that L k L f + η (η > 0), which is reasonable since L k is increased by a factor of γ at step 5 in Algorithm 1. Before proceeding, the following lemmas are given, which will be used later. Lemma 2.2. Let {x k } be generated by Algorithm 1, then p Lk,λ(x k, x k+1 ) is nonincreasing. Proof. From Algorithm 1, one can get p Lk,λ(x k, x k+1 ) = f(x k ) + f(x k ), x k+1 x k + L k 2 xk+1 x k 2 + λ x k+1 0 f(x k ) + λ x k 0 f(x k 1 ) + f(x k 1 ), x k x k 1 + L k 1 2 xk x k λ x k 0 = p Lk 1,λ(x k 1, x k ), where the second inequality follows from that f is Lipschitz continuous and the assumption that L k 1 > L f. Thus, p L,λ (x k, x k+1 ) is nonincreasing. Lemma 2.3. [18] Let {x k } be generated by Algorithm 1. S(x k ) does not change if k is large enough. Moreover, for all k 0, if x k+1 j 0, then it holds that x k+1 2λ j L k. Next we give a definition of local minimizer for problem (1). If we use the definition of local minimizer for continuous optimization problem, it is easy to verify that every feasible solution of problem (1) is a local minimizer. This is not useful for the solution of the problem. Hence we have to consider the characteristic of the problem for giving a useful definition of local minimizer. Definition 2.1. If Ax = y, and the columns of A corresponding to the nonzero components of x are linearly independent, then x is called a local minimizer of the problem (1). Definition 2.1 is reasonable, since we have the following sufficient and necessary condition. Theorem 2.1. x is a local minimizer of problem (1) if and only if Ax = y, and for every h S(x ), A S(x )\{h}x S(x )\{h} = y has no solution. Proof. Suppsoe that x is a local minimizer of problem (1). Without lost of generality, assume that S(x ) = {1, 2,, p}. Since x is feasible, we have where A i is the i-th column of A. A 1 x 1 + A 2 x A p x p = y, (10) 6

7 Now take the p-th component of x out of the support set, and we solve the equation If it has a solution, let it be ˆx. Then By (10)-(11), we have A 1 x 1 + A 2 x A p 1 x p 1 = y. A 1ˆx 1 + A 2ˆx A p 1ˆx p 1 = y. (11) A 1 (x 1 ˆx 1 ) + A 2 (x 2 ˆx 2 ) + + A p 1 (x p 1 ˆx p 1 ) + A p x p = 0. Thus, if A 1, A 2,, A p are linearly independent, then x p = 0, which contradicts the assumption that x p 0, and then the conclusion holds. The converse direction can be proved in a similar manner, and is omitted here. Hence we have given a discrete version of the definition of local minimizer for problem (1). For proving that the solutions produced by our algorithms are local minimizers defined as Definition 2.1, we need the definition of spark, which was firstly given in [12]. Definition 2.2. [12] The spark of a given matrix A, denoted by Spark(A), is the smallest number of columns from A that are linearly dependent. 3 Homotopy IHT methods The sparse optimization problem (5) is regularized by the l 0 -norm, balanced by a regularization parameter λ. To solve the sparse optimization problem, the iterative hard thresholding methods use a fixed value of the regularization parameter. Obviously, a suitable value of the parameter λ may produce better solutions. But unfortunately, finding a good value of the regularization parameter is a challenge work. Hence a homotopy method is given here to efficiently calculate and trace the solutions of the sparse optimization problem along a homotopy path. The main idea of the homotopy method based on the l 0 -norm is that, set a large initial value of the regularization parameter λ and gradually decrease it with some strategy. For every fixed value of the regularization parameter λ, the viht method is used to find an approximate optimal solution of problem (5), and then use it as the initial solution for the next iteration. This kind of initial solution strategy is called warm-starting. Usually, the next loop with warm-starting will require fewer iterations than the current loop, and dramatically fewer than the number of iterations initialized at zero [25]. The initial value of the regularization parameter is set as: λ 0 = c y 2, 0 < c < 1, 7

8 where y is as in equation (1), since if λ > y 2, then ϕ λ (x) has a strict global minimum at the origin [24]. In fact, x 0 x 0 1 ϕ λ (0) = y 2 Ax y 2 + λ x 0. Moreover, the decreasing speed of λ is always geometrical. That is, for an initial value λ 0 and a parameter ρ (0, 1), set λ k+1 = ρλ k for k = 0, 1, 2,, until some termination condition reaches. The upper bound L k on the Lipschitz constant L f is obtained by a line search technique. The line search begins with an initial value L 0 k such as equation (8), and then increases L k by a factor of γ > 1 until the condition in step 4 of Algorithm 1 reaches. The frame of the proposed homotopy iterative hard thresholding (HIHT) algorithm is described as in Algorithm 2. Algorithm 2: {x, L } HIHT (L 0, λ 0, x 0 ) Input: L 0, λ 0, x 0, L min, L max ; // L o [L min, L max ] Output: {x, L }; 1: initialization k 0, ρ (0, 1); 2: repeat 3: i 0; 4: x k,0 = x k ; 5: L k,0 L k ; 6: repeat 7: x k,i+1 T Lk,i (x k,i ); 8: while ϕ λk (x k,i ) ϕ λk (x k,i+1 ) < η 2 xk,i x k,i+1 2 do 9: L k,i min{γl k,i, L max }; 10: x k,i+1 T Lk,i (x k,i ); 11: end while 12: L k,i+1 L k,i ; 13: i i + 1; 14: until S(x k,i ) does not change when increasing i 15: x k+1 x k,i ; 16: L k+1 L k,i. 17: λ k+1 ρλ k ; 18: k k + 1; 19: until some termination condition reaches 20: x x k ; 21: L L k ; Remark. According to Lemma 2.3, the iteration between steps 6-14 terminates in a finite number of steps. 8

9 The convergence analysis of Algorithm 2 is presented as the following theorem. The proof is based on the proof of the convergence analysis of IHT [5, 18]. Theorem 3.1. Let {x k,i } be the sequence generated by steps 6-14 of Algorithm 2, and {x k } be the sequence generated by step 15 of Algorithm 2. We have: (i) {x k,i } and {ϕ λk (x k,i )} converge. (ii) Either S(x k,i ) changes only in a finite number of iterations, or for all h Γ k,i, x k,i h is an arbitrarily small amount, i.e., for any ε > 0, there exists K > 0, such that when k K, x k,i h < ε or xk,i+1 h < ε, where Γ k,i is the set of indices of components or x k,i+1 h at which the zero components of x k,i are set to nonzero, or nonzero components are set to zero. (iii) If A is full row rank, then any limit point of {x k,i } is feasible for problem (1), i.e., if x = lim k x k,i, then Ax = y. (iv) Suppose that S(x k,i ) does not change after a sufficiently large k. Let ϕ =, lim ϕ λ k (x k ). Then the number of changes of S(x k ) is at most T = k where λ denotes the value of λ k 2λ mj δ j = max{, 2λ mj 1 L mj +1 L mj }. 2(ϕλ0 (x 0 ) ϕ ) ηδ 2 +2(1 ρ) λ from which S(x k ) begins unchanged, and δ = min j δ j, Proof. (i) Firstly, for each fixed λ k, by Lemma 2.3, steps 6-14 will be terminated in a finite number of iterations. Furthermore, by Lemma 2.2 and the choice of x k,i from Algorithm 2, one can obtain that ϕ λk (x k ) = f(x k ) + λ k x k 0 f(x k ) + f(x k ), x k,1 x k + L k,0 2 xk,1 x k 2 + λ k x k,1 0 = f(x k,0 ) + f(x k,0 ), x k,1 x k,0 + L k,0 2 xk,1 x k,0 2 + λ k x k,1 0 f(x k,nk 1 ) + f(x k,nk 1 ), x k+1 x k,nk 1 + L k,n k 1 x k+1 2 x k,nk λ k x k+1 0, (12) where n k is the number of iterations between steps 6-14 at the k-th outer iteration. Then for each λ k, when i n k, S(x k,i ) = S(x k,n k ) and S(x k,n k ) S(x k,n k 1 ). (13) 9

10 On the other hand, since f is Lipschitz continuous, one can observe that ϕ λk+1 (x k+1 ) = f(x k+1 ) + λ k+1 x k+1 0 f(x k,n k 1 ) + f(x k,n k 1 ), x k+1 x k,n k 1 + L f 2 xk+1 x k,n k λ k+1 x k+1 0, which together with (12), x k+1 = x k,n k and Lk,nk 1 L f + η imply that ϕ λk (x k ) ϕ λk+1 (x k+1 ) L k,n k 1 L f x k+1 x k,nk (λ k λ k+1 ) x k η 2 xk+1 x k,nk λ k (1 ρ) x k+1 0 (14) λ k (1 ρ) x k Thus ϕ λk (x k ) is nonincreasing. Furthermore, ϕ λ (x) is lower bounded for λ 0. Therefore ϕ λk (x k ) converges for k. By the proof of Theorem 3.4 in [18], i.e., for every fixed λ k, ϕ λk (x k,i ) is nonincreasing, the convergence of ϕ λk (x k ), and nonincreasing λ k, one can observe that the sequence ϕ λ0 (x 0 ) ϕ λ0 (x 0,0 ), ϕ λ0 (x 0,1 ), ϕ λ0 (x 0,2 ),, ϕ λ0 (x 0,n 0 ) ϕ λ0 (x 1 ) ϕ λ1 (x 1 ) ϕ λ1 (x 1,0 ), ϕ λ1 (x 1,1 ), ϕ λ1 (x 1,2 ),, ϕ λ1 (x 1,n 1 ) ϕ λ1 (x 2 ) ϕ λk (x k ) ϕ λk (x k,0 ), ϕ λk (x k,1 ), ϕ λk (x k,2 ),, ϕ λk (x k,n k ) ϕ λk (x k+1 ) is nonincreasing and converges. Next, by ϕ λk (x k ) ϕ λk+1 (x k+1 ) 0 when k, and the second inequality of (14), one can get that x k+1 x k,n k 1 = x k,n k x k,n k 1 0, when k. Similarly, x k,i+1 x k,i 0, when k, i = 0, 1, 2,, n k 1. Thus, the sequence x 0 x 0,0, x 0,1, x 0,2,, x 0,n 0 x 1 x 1 x 1,0, x 1,1, x 1,2,, x 1,n 1 x 2 x k x k,0, x k,1, x k,2,, x k,n k x k+1 converges, i.e., for any ε > 0, there exists K > 0, such that when k K, x k,i x k,i+1 < ε holds for all i {0, 1, 2,, n k 1}. (ii) There are two possibilities for S(x k,i ), i.e., either S(x k,i ) does not change after a sufficiently large k, or S(x k,i ) changes infinitely often. 10

11 In the second case, since {x k,i } converges, for any ε > 0, there exists K > 0, such that when k K, x k,i x k,i+1 < ε. Thus, x k,i h xk,i+1 h < ε holds, for all h {1, 2,, n}. Let Γ k,i be the set of indices of components at which the zero components of x k,i are set to nonzero, or nonzero components are set to zero. Then for all h Γ k,i, x k,i h < ε or x k,i+1 h < ε. (iii) By the hard thresholding operator (see Lemma 2.1), all components that are set to zero must satisfy [x k,i 1 f(x k,i 2λ k )] j. L k,i L k,i 2λk Since λ k 0 when k, for ε > 0 there exists K 1, such that when k K 1, < ε and [x k,i 1 f(x k,i )] j < ε. (15) L k,i L k,i Furthermore, since x k,i+1 j is set to 0, and {x k,i } is convergent, for sufficiently large K 1 these components must satisfy which together with (15) imply that x k,i j = x k,i j x k,i+1 j < ε, f j (x k,i ) L k,i ( x k,i j + ε) < 2L k,i ε. In the other respect, by Lemma 2.1, all components that are set to nonzero must satisfy x k,i j x k,i+1 1 j = f j (x k,i ). L k,i Since {x k,i } is convergent, the above inequality leads to for sufficiently large k. x k,i j x k,i+1 1 j = f j (x k,i ) < ε L k,i Hence, if k is large enough, then for all j {1, 2,, n}, f j (x k,i ) < 2L k,i ε. Thus if ε 0, we have f(x k,i ) 0, i.e., A H (Ax k,i y) 0. So A H (Ax y) = 0. With the assumption that A is full row rank, one can obtain that Ax y = 0. (iv) Suppose that S(x k ) changes only finitely often. Without lost of generality we suppose that S(x k ) changes only at k = m 1 + 1,, m J + 1. In other words, S(x m j 1+1 ) = = S(x m j ) S(x m j+1 ) = = S(x m j+1 ), j = 1, 2,, J. (16) 11

12 Let m 0 = 0. For any j {1, 2,, J}, there exists i, such that x m j+1 i or x m j+1 i Let and δ = 0 and x m j i = 0. Then by Lemma 2.1, x m j+1 x m j max{ x m j+1 i min δ j. We have j {1,2,,J} δ j = max{ which together with (14) imply that, x m j i } max{ 2λ mj L mj +1, 2λmj L mj,n mj 1 2λ mj 1 L mj }, x m j+1 x m j δ j, j = 1, 2,, J, ϕ λmj (x m j ) ϕ λmj +1(x m j+1 ) η 2 δ2 + λ mj (1 ρ) x m j+1 0. Summing up the above inequalities, one can get ϕ λ0 (x 0 ) ϕ ϕ λm1 (x m 1 ) ϕ λmj +1(x m J +1 ), η 2 δ2 J + (1 ρ)(λ m1 + + λ mj ) η 2 δ2 J + (1 ρ) λj, = 0 and x m j i 0, 2λmj 1 L mj 1,n mj 1 1 where λ = λ mj, and the first inequality follows from the fact that {ϕ λ (x k )} is nonincreasing (see (14)). Thus 2(ϕλ0 (x 0 ) ϕ ) J. ηδ 2 + 2(1 ρ) λ }. Remark. (i) Although the above theorem states that S(x k,i ) may change, the relative components just change by an arbitrarily small amount. However, the experiments in Section 5 empirically show that S(x k,i ) changes only in a finite number of steps. (ii) Let {x k } be the sequence generated by step 15 of Algorithm 2. By Theorem 3.1, the sequence {ϕ λk (x k )} is nonincreasing. In other words, Algorithm 2 is a decreasing algorithm. Now we consider the following problem min φ λ (x) = x f(x), (17) x λ 12

13 which has the same solution as problem (5). We use the similar method for problem (5) to solve the problem (17), and consider q L,λ (x 0, x) = x λ (f(x0 ) + f(x 0 ), x x 0 + L 2 x x0 2 ), (18) which has the same solution as problem (6) and can be solved by the viht method. Moreover, it is easy to verify that the viht for both the problems are equivalent. For the sake of easy statement, let h L (x 0, x) = f(x 0 ) + f(x 0 ), x x 0 + L 2 x x0 2. (19) Then we can get a bound on the number of nonzero components of the limit of the sequence produced by Algorithm 2 as follows, and prove that the limit is a local minimizer of problem (1). Theorem 3.2. Let {x k,i } be the sequence generated by steps 6-14 of Algorithm 2 and x = lim x k,i. If for all k, 1 > ρ h Lf (xk,nk 1,xk+1 ), then k h Lk,nk 1 (xk,n k 1,x k+1 ) (i) x 0 x λ 0 f(x 0 ) C, where C = lim x 0 y 2 2λ 0 -C; (ii) S(x k ) is a constant when k is large enough; f(xk ) k λ k. Specifically, if x 0 = 0, then (iii) if x 0 = 0 and Spark(A) > y 2 2λ 0 C, then x is a local minimizer of problem (1). Proof. (i) Similar to the proof of Theorem 3.1, φ λk (x k ) = 1 λ k f(x k ) + x k 0 1 λ k (f(x k ) + f(x k ), x k,1 x k + L k,0 2 xk,1 x k 2 ) + x k,1 0 = 1 λ k (f(x k,0 ) + f(x k,0 ), x k,1 x k,0 + L k,0 2 xk,1 x k,0 2 ) + x k,1 0 1 (f(x k,nk 1 ) + f(x k,nk 1 ), x k+1 x k,nk 1 + L k,n k 1 x k+1 λ k 2 x k,nk 1 2 ) + x k+1 0. (20) On the other hand, since f is Lipschitz continuous, one can observe that ϕ λk+1 (x k+1 ) = 1 λ k+1 f(x k+1 ) + x k λ k+1 (f(x k,n k 1 ) + f(x k,n k 1 ), x k+1 x k,n k 1 + L f 2 xk+1 x k,n k 1 2 ) + x k+1 0, 13

14 which together with (20) and x k+1 = x k,n k imply that φ λk (x k ) φ λk+1 (x k+1 ) 1 λ k h Lk,nk 1 (xk,n k 1, x k+1 ) 1 λ k+1 h Lf (x k,n k 1, x k+1 ). (21) Hence, by the assumption that 1 > ρ = λ k+1 λ k h L f (x k,nk 1,x k+1 ), and the fact that h Lk,nk 1 (xk,n k 1,x k+1 ) φ λ (x) is bounded below, we know that {φ λk (x k )} is nonincreasing and converges. Then one can get that x λ 0 f(x 0 ) x k λ k f(x k ). (22) Let k, and x be the limit of {x k }. Then (22) becomes where C = lim k 1 λ k f(x k ). Specifically, if x 0 = 0, then we get x λ 0 f(x 0 ) x 0 + C, x 0 y 2 2λ 0 C. (ii) Since {φ λk (x k )} converges, and the first item of x k λ k f(x k ) takes discrete values, it is evident that x k 0 is unchanged when k is large enough. (iii) Furthermore, if Spark(A) > y 2 2λ 0 C, then by (i), starting from x 0 = 0, the limit x of the sequence x k is a local minimizer of problem (1) according to Definitions 2.1 and 2.2. Remark. (i) Without lost of generality, suppose that the observed data y 0. Then it easy to see that φ λ (x) > 0. Furthermore, by the fact that h Lf (x k,n k 1, x k+1 ) f(x k+1 ), if x k+1 is not a feasible solution, then h Lf (x k,n k 1, x k+1 ) f(x k+1 ) > 0. In this case, if L k,nk 1 > L f, then we have h Lk,nk 1 (xk,n k 1, x k+1 ) > h Lf (x k,n k 1, x k+1 ) > 0, and h Lf (x k,n k 1,x k+1 ) h Lk,nk 1 (xk,n k 1,x k+1 ) < 1, which means that we can find some values of ρ that satisfy 1 > ρ = λ k+1 λ k h L f (x k,nk 1, x k+1 ) h Lk,nk 1 (xk,n k 1, x k+1 ). (ii) Statements (i) and (iii) of Theorem 3.2 indicate that x 0 = 0 is a good initial solution for Algorithm 2, since it will produce a solution of problem (1) with theoretical guarantees. For each fixed λ, Algorithm 2 iterates to get a solution of problem (5) between steps In the following Algorithm, we accelerate Algorithm 2 by just calling one outer 14

15 loop of steps 2-10 of Algorithm 1 for every regularization parameter λ. The main frame is given as follows. Algorithm 3: {x, L } AHIHT (L 0, λ 0, x 0 ) Input: L 0, λ 0, L min, L max ; Output: {x, L }; 1: initialization k 0, ρ (0, 1); 2: repeat 3: x k+1 T Lk (x k ); 4: while ϕ λk (x k ) ϕ λk (x k+1 ) < η 2 xk x k+1 2 do 5: L k min{γl k, L max }; 6: x k+1 T Lk (x k ); 7: end while 8: L k+1 L k ; 9: λ k+1 ρλ k ; 10: k k + 1; 11: until some termination condition reaches 12: x x k ; 13: L L k ; Remark. Algorithm 3 is different from Algorithm 1 mainly on that the regularization parameter is changing. As Algorithm 2, it traces possible solutions along a homotopy path to overcome the difficulty of parameter choosing. The convergence analysis of Algorithm 3 is presented as the following theorem and the proof of this theorem is similar to that of Theorem 3.1. Theorem 3.3. Let {x k } be the sequence generated by steps 2-7 of Algorithm 3. We have: (i) {x k } and {ϕ λk (x k )} converge. (ii) Either S(x k ) changes only in a finite number of iterations, or for all h Γ k, x k h or x k+1 h is an arbitrarily small amount, i.e., for any ε > 0, there exists K > 0, such that when k K, x k h < ε or xk+1 h < ε, where Γ k is the set of indices of components at which the zero components of x k are set to nonzero, or nonzero components are set to zero. (iii) If A is full row rank, then any limit point of {x k } is feasible for problem (1), i.e., if x = lim k x k, then Ax = y. (iv) Suppose S(x k ) does not change after a sufficiently large k. Let ϕ = lim ϕ λk (x k ). k Then the number of changes of S(x k ) is at most T 2(ϕλ0 (x = 0 ) ϕ ), where λ denotes the ηˆδ 2 +2(1 ρ) λ value of λ where S(x k ) begins unchanged; ˆδ = min ˆδj, ˆδ 2λ mj 1 j = max{, 2λ mj j L mj 1 L mj }. 15

16 Proof. (i) Firstly, by the choice of x k+1 in Algorithm 3, we have ϕ λk (x k ) = f(x k ) + λ k x k 0 f(x k ) + f(x k ), x k+1 x k + L k 2 xk+1 x k 2 + λ k x k+1 0. (23) On the other hand, following from the Lipschitz continuous of f(x), one can observe that ϕ λk+1 (x k+1 ) = f(x k+1 ) + λ k+1 x k+1 0 f(x k ) + f(x k ), x k+1 x k + L f 2 xk+1 x k 2 + λ k+1 x k+1 0. (24) Inequalities (23), (24), and together with L k L f + η imply that ϕ λk (x k ) ϕ λk+1 (x k+1 ) L k L f x k+1 x k 2 + (λ k λ k+1 ) x k = η 2 xk+1 x k 2 + λ k (1 ρ) x k+1 0 η 2 xk+1 x k 2. Hence ϕ λk (x k ) is nonincreasing. Furthermore, ϕ λ (x) is lower bounded for λ 0. Therefore ϕ λk (x k ) converges for k. The proofs of the other statements are similar to the proofs of the relevant statements in Theorem 3.1. Similar to the proof of Theorem 3.2, we can get the following results. Theorem 3.4. Let {x k } be the sequence generated by steps 2-11 of Algorithm 3 and x = lim x k. If for all k, 1 > ρ h Lf (xk,xk+1 ), then k h Lk (x k,x k+1 ) (i) x 0 x λ 0 f(x 0 ) C, where C = lim x 0 y 2 2λ 0 -C; (ii) S(x k ) is a constant when k is large enough; f(xk ) k λ k. Specifically, if x 0 = 0, then (iii) if x 0 = 0 and Spark(A) > y 2 2λ 0 C, then x is a local minimizer of problem (1). 4 Empirical algorithms In the previous section, we have proved that our methods converge to feasible solutions of problem (1). The theorems require that the regularization parameter λ converges to zero. However, in our experiment (see Subsection 5.1), we have found that if λ is too small, the 16

17 experiment results are not good enough. Hence, similar to the idea of the PGH method [26], we set a lower bound λ t on λ in our HIHT/AHIHT methods. Since Algorithm 1 is a good method for the sparsity problem and it may converge to a local minimizer of problem (5) [18], similar to the idea of the PGH method [26], we use it to improve the solutions obtained by our HIHT/AHIHT methods. For the termination condition in step 14 of Algorithm 2, it may not be easily verified. Hence for practical considerations, we modify it as: for each λ k, the loop 6-14 stops when it reaches x k,i x k,i+1 ϵ 0, (25) where ϵ 0 > 0. For the termination condition in step 10 of Algorithm 1, we set it as: when the infinity norm of the difference of two adjacent solutions in the sequence generated by the method is smaller than a given precision ϵ > 0, i.e., x k+1 x k ϵ. (26) Since the solutions obtained by Algorithms 2 and 3 will be improved by Algorithm 1, we set ϵ 0 > ϵ, e.g., ϵ = 10 5, and ϵ 0 = The main frame of the empirical algorithm is given as follows: Algorithm 4: {x, L } HomAlg(L 0, λ 0, x 0 ) Input: L 0, λ 0, x 0 ; Output: {x, L }; 1: initialization; 2: { x, L} HIHT/AHIHT (L 0, λ 0, x 0 ); //the termination condition is that λ reaches the lower bound λ t 3: {x, L } viht ( L, λ t, x). // the termination condition is that (26) is satisfied 5 Experiments In this section, numerical experiments for testing the performance of our HIHT/AHIHT methods for the CS problem are presented. When conducting the experiments, our HI- HT/AHIHT methods take the frame of Algorithm 4. And for simplification, we still call them HIHT/AHIHT methods, respectively. In Subsection 5.1, the performances of our HIHT/AHIHT methods and Algorithm 1 with different parameters are showed. The results indicate that our algorithms can overcome the shorting of Algorithm 1 in some degree. The separable approximation (SpaRSA) method [25] and the fixed point continuation (FPC) method [15] are also the 17

18 state of the art methods for solving the sparse problem. But in the experiments of [26], the PGH and ADGH methods [26] outperform them, so we just compare our methods with the PGH and ADGH methods 1 in Subsection 5.2. The results show that our methods outperform the PGH/ADGH methods both in quality and running time. All experiments are performed on a personal computer with an Intel(R) Core(TM)2 Duo CPU E7500 (2.93GHz) and 2GB memory, using a MATLAB toolbox. 5.1 Influence of the parameters In this experiments, we mainly conduct our experiments on solving the CS problem with noise. When the observations have noises, the compressed sensing problem can be written as: y = A x +z, where x R n is the vector of unknowns, A R m n (m n) and y R m are the problem data, and z is the measure noise. Then for a fixed error level ε > 0, the CS problem (1) can be rewritten as: min x x 0 s.t. Ax y < ε, which can also be regularized as the problem (4). In the first experiment, we investigate the influences of the target value λ t and the descent speed ρ of the regularization parameter λ. An instance of the compressed sensing problem was generated similar to that in [26], which was generated randomly with size , i.e., m = 1000, n = 5000, and S( x ) = 100, and the elements of matrix A distributed uniformly in the unit sphere. The vector x was generated with the same distribution at 100 randomly chosen coordinates. The noise z was distributed randomly and uniformly in the sphere with ratio r = Finally, the vector y was generated by y = A x + z. In the experiment, all parameters were set as follows. The initial value λ 0 of the regularization parameter λ was set to A H y. The initial value L 0 of the line search was set similar to that in [26], i.e., L min = max 1 j n A j 2, where A j is the j-th column of A. L max was set to. γ = 2 controls the increasing speed of L and η = 1. The initial solution for all algorithms was set as x 0 = 0, and the precisions were set as ϵ = 10 5, and ϵ 0 = The results of the HIHT method and the AHIHT method are showed in Tables 1 and 2, respectively. In the tables, we present the CPU times (in second) required by the methods, the sizes of the support sets of the reconstructed data ˆx given by the methods, the mean squared errors (MSE) with respect to x, which is defined as MSE = 1 n ˆx x, 1 The PGH/ADGH packages may be found at: 18

19 Table 1: Test results of the HIHT method with various λ t and descent speed ρ of λ. λ ρ t time(s) S(ˆx) MSE 1.367e e e e e e-5 Er time(s) S(ˆx) MSE 1.441e e e e e e-5 Er time(s) S(ˆx) MSE 1.511e e e e e e-5 Er Table 2: Test results of the AHIHT method with various λ t and descent speed ρ of λ. λ ρ t time(s) S(ˆx) MSE 1.367e e e e e e-5 Er time(s) S(ˆx) MSE 1.441e e e e e e-5 Er time(s) S(ˆx) MSE 1.364e e e e e e-5 Er and the errors of Aˆx y, which is defined as Er = Aˆx y 2. From the two tables, for every given λ t, we can find that the descent speed ρ of the regularization parameter λ affects in some degree the CPU times required by the relative methods, but slightly the qualities of the solutions generated by our methods. And for every given ρ, the target value λ t affects evidently the performances of our methods both in running time and solution quality. Though a bigger value of λ t may reduce the CPU time, the quality of the solution is becoming bad. To balance the running times required 19

20 Table 3: Influence of the regularization parameter λ on the viht method. λ time(s) S(ˆx) MSE 1.009e e e e e e-4 Er 1.776e e by the methods and the qualities of the solutions, we can find out that when conducting the experiments with λ t = 0.01, our methods give better reconstructed signals. When comparing our two methods on the same instance and parameters, the AHIHT method is faster than the HIHT method but the qualities of the solutions generated by the two methods are almost the same. In the second experiment, we will show that the regularization parameter λ affect significantly the quality of the solutions generated by the viht method. The size of the instance, the way of data generation and the settings of parameters are the same as in the first experiment. The results of the viht method on the generated instance are put in Table 3, from which we can find out that if the regularization parameter λ is not chosen suitably, the quality of the reconstructed signal is terrible. From the above two experiments, we find that our HIHT/AHIHT methods do not depend on the target value λ t as strongly as that the viht method depends on the regularization parameter λ. Hence, our methods in some sense can overcome the shortcoming of the viht method. 5.2 Comparing with other homotopy methods In this subsection, we compare the performances of our HIHT/AHIHT methods with those of the state of the art methods the PGH/ADGH methods for solving the compressed sensing problem, since they are the homotopy methods, but regularized with different norms. A test instance was generated in the same way as that in the first experiment. To be fairly, all values of the parameters were set as the default values in the PGH/ADGH packages, except that the termination condition of all methods (including PGH/ADGH) was changed to (26), and the target value λ t of our methods was set as 0.01, but 1 for the PGH/ADGH methods as it is the default value. When using the PGH/ADGH packages, all other parameters were set by default. Numerical results are put in Table 4 and Figures 1-3. Table 4 presents the CPU times required by the compared methods, as well as the sizes of the support sets of the reconstructed data ˆx obtained by the methods, the mean squared errors (MSE) with respect to x, and the errors of Aˆx y. From the column 20

21 Table 4: Numerical comparisons of the methods on instance with size m = 1000, n = 5000, and S( x ) = 100, and noise z is uniformly distributed over [-0.01,0.01]. Alg. time(s) S(ˆx) MSE( 10 6 ) Er PGH HIHT AHIHT ADGH φ(x k ) φ * PGH ADGH HIHT AHIHT k Figure 1: Objective gap for each iteration. time(s), we can find that the AHIHT method is faster than the HIHT method, but both the two methods are much faster than the PGH and ADGH methods. On the size of the support set of the reconstructed signal, the HIHT/AHIHT methods can reconstruct signals with the same number of nonzero components as that of the original signal x, but the PGH/ADGH methods cannot. Furthermore, from the columns MSE and Er, we can find that our methods can reconstruct x with smaller errors. Hence our methods outperform the PGH/ADGH methods both in running time and solution quality in this experiment. Figure 1 shows the relationship between the total number of iterations k and the objective gap 2. Though the descent speeds of the PGH method and the HIHT method are almost the same, they are faster than the ADGH method. While the AHIHT method needs the fewest number of iterations among them, which just takes about 30 iterations to achieve the termination condition. 2 In Figure 1, for the HIHT/AHIHT methods, ϕ is the minimum value of ϕ(x) = 1 2 Ax y 2 +λ t x 0, ϕ(x k ) = 1 2 Axk y 2 + λ k x k 0. And for the PGH/ADGH methods, ϕ is the minimum value of ϕ(x) = 1 2 Ax y 2 + λ t x 1, ϕ(x k ) = 1 2 Axk y 2 + λ k x k 1. The minimum values ϕ are given by the relative methods with lower ϵ value, such as 10 6 in our experiments. 21

22 x k PGH ADGH HIHT AHIHT k Figure 2: Sparsity for each x k PGH ADGH HIHT AHIHT Iterations λ 0 /λ K Figure 3: The number of inner iterations for each λ k. Figure 2 pictures the sparsities of {x k } or {x k,i } at each iteration along the algorithms progress. It shows that the sizes of the support sets of the sequences {x k } or {x k,i } generated by our HIHT/AHIHT methods are much more stable than those generated by the PGH/ADPG methods. At the first 10 iterations, they are stable in almost the same manner. But after that, for the PGH/ADGH methods, the sizes of the support sets of the sequences are wavy until 80 or more iterations. To get approximate optimal solutions, the numbers of total iterations of our HIHT/AHIHT methods are smaller. Moreover, we find that after dozens of steps, the support sets of the sequences generated by our proposed 22

23 methods are unchanged in this experiment. Figure 3 pictures the number of inner iterations for each homotopy method, i.e., the number of inner iterations for each regularization parameter λ. It shows that, except the last one, the number of inner iterations for each fixed regularization parameter λ is small. Actually, they are not more than 10 inner iterations, and the HIHT method just takes 3 or 4 inner iterations to achieve the relative precision. However, all algorithms need more inner iterations at the last regularization parameter value. This is because that, when the homotopy methods stop, a warm-starting strategy is used with the initial values given by the relative homotopy methods and a higher precision ϵ = 1e 5, to get more accurate solutions. In the following fourth experiment, since the PGH method outperforms the ADGH method in the above third experiment, we just compare the qualities and performances of the three main methods HIHT/AHIHT/PGH with different sparsity levels. For each value of sparsity level S( x ), we randomly generated 50 instances with the same size using the method in the first experiment. The parameters of PGH were set as default, except that the termination condition was changed to (26). And for the HIHT/AHIHT methods, we set the target value as λ t = 0.01 since at this value our methods have good performances in the first experiment, and we set ρ = 0.7 since ρ = 0.7 is the default value in the PGH/ADGH packages. The other parameters were set as the same values in the first experiment, i.e., the same as the default values of the PGH/ADGH packages. The averaged results are presented in Table 5. The results in Table 5 show that the three methods can solve the CS problem with low sparsity levels efficiently but high sparsity levels badly. The PGH method can reconstruct the signals when the sparsity level is 50. But our HIHT/AHIHT methods can reconstruct the signals when the sparsity levels are not larger than 250, and 200, respectively. As for the running times, our HIHT/AHIHT methods can solve the CS problem faster than the PGH method when the sparsity level is the same, especially when the sparsity level is becoming higher. When comparing our two methods, the AHIHT method is faster than the HIHT method with the same quality of solution if the sparsity level is low. But if the sparsity level becomes 250, the AHIHT method cannot reconstruct the signals well, and if the sparsity level is 300, both of our two methods cannot reconstruct the signals. 6 Conclusions In this paper, we have applied the homotopy technique into the l 0 regularized sparsity problem directly and have proposed two homotopy methods, HIHT and AHIHT, to approximately solve the compressed sensing problem. Under some mild assumptions, the two methods can converge to feasible solutions of the problem. Moreover, we have given a definition of local minimizer for problem (1). Theoretical analyses indicate that, under some conditions the numbers of nonzero components of the limits of the sequences 23

24 Table 5: The averaged results on the 50 random instances with size m = 1000, n = 5000, and noise z is uniformly distributed over [-0.01,0.01], for each sparsity level S( x ). S( x ) Alg. time(s) S(ˆx) MSE( 10 6 ) Er time(s) S(ˆx) MSE( 10 6 ) Er PGH HIHT AHIHT S( x ) Alg. time(s) S(ˆx) MSE( 10 6 ) Er time(s) S(ˆx) MSE( 10 6 ) Er PGH HIHT AHIHT S( x ) Alg. time(s) S(ˆx) MSE( 10 6 ) Er time(s) S(ˆx) MSE( 10 6 ) Er PGH HIHT AHIHT produced by our methods are bounded above, and the converged solutions are local minimizers of problem (1). To improve the solution qualities of the methods, we have given two empirical algorithms, which consist of two stages: in the first stage, our HIHT/AHIHT methods are called, and in the second stage, we use the viht method to improve the solutions given by the first stage. From the experimental results, we find out that our two empirical algorithms can efficiently solve the CS problems with high quality, and can overcome the shortcoming of the viht method in some degree. Moreover, when comparing with the two state-ofthe-art homotopy methods PGH/ADGH, the solutions generated by our two empirical algorithms are better than those by the PGH/ADGH methods both in running time and solution quality. In fact, our two proposed homotopy methods almost can recover the noise signals with suitable values of parameters. References [1] A. Beck and M. Teboulle. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM Journal on Imaging Sciences, 2(1): , [2] T. Blumensath. Accelerated iterative hard thresholding. Signal Processing, 92(3): , [3] T. Blumensath and M. E. Davies. Iterative thresholding for sparse approximations. Journal of Constructive Approximation, 14(5-6): ,

25 [4] T. Blumensath and M. E. Davies. Iterative hard thresholding for compressed sensing. Applied and Computational Harmonic Analysis, 27(3): , [5] T. Blumensath and M. E. Davies. Normalized iterative hard thresholding: guaranteed stability and performance. IEEE Journal of Selected Topics in Signal Processing, 4(2): , [6] E. J. Candes, J. Romberg, and T. Tao. Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Transactions on Information Theory, 52(2): , [7] E. J. Candes, M. B. Wakin, and S. P. Boyd. Enhancing sparsity by reweighted l 1 minimization. Journal of Fourier Analysis and Applications, 14(5-6): , [8] R. Chartrand. Exact reconstruction of sparse signals via nonconvex minimization. IEEE Signal Processing Letters, 14(10): , [9] X. Chen, M. Ng, and C. Zhang. Nonconvex l p regularization and box constrained model for image restoration. IEEE Transactions on Image Processing, 21(12): , [10] P. L. Combettes and V. R. Wajs. Signal recovery by proximal forward-backward splitting. Multiscale Modeling and Simulation, 4(4): , [11] G. M. Davis, S. Mallat, and M. Avellaneda. Adaptive greedy approximations. Journal of Constructive Approximation, 13(1): 57-98, [12] D. L. Donoho and M. Elad. Optimally sparse representation in genearal (nonorthogonal) dictionaries via l 1 minimization. Proc. Natl. Acad. Sci. USA, 100: , [13] B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani. Least angle regression. The Annals of Statistics, 32(2): , [14] M. A. Figueiredo, R. D. Nowak, and S. J. Wright. Gradient projection for sparse reconstruction: application to compressed sensing and other inverse problems. IEEE Journal of Selected Topics in Signal Processing, 1(4): , [15] E. T. Hale, W. Yin, and Y. Zhang. Fixed-point continuation for l 1 -minimization: Methodology and convergence. SIAM Journal on Optimization, 19(3): , [16] M. J. Lai, Y. Y. Xu, and W. T. Yin. Improved iteratively reweighted least squares for unconstrained smoothed l q minimization. SIAM Journal on Numerical Analysis, 51(2): ,

26 [17] Q. Lin and L. Xiao. An adaptive accelerated proximal gradient method and its homotopy continuation for sparse optimization. Technical Report MSR-TR , Microsoft Research, [18] Z. Lu. Iterative hard thresholding methods for l 0 regularized convex cone programming. Mathematical Programming, to appear, [19] Z. Lu and Y. Zhang. Sparse approximation via penalty decomposition methods. SIAM Journal on Optimization, to appear, [20] D. M. Malioutov and A. S. Willsky. Homotopy continuation for sparse signal representation. IEEE International Conference on Acoustics, Speech and Signal Processing, 5: , [21] S. Mallat and Z. Zhang. Matching pursuit in a time-frequency dictionary. IEEE Transactions on Signal processing, 41(12): , [22] B. K. Natarajan. Sparse approximate solutions to linear systems. SIAM Journal on Computing, 24(2): , [23] Y. Nesterov. Gradient methods for minimizing composite functions. Mathematical Programming, 140(1): , [24] M. Nikolova. Description of the minimizers of least squares regularized with l 0 -norm. uniqueness of the global minimizer. SIAM Journal on Imaging Sciences, 6(2): , [25] S. J. Wright, R. D. Nowak, and M. A. T. Figueiredo. Sparse reconstruction by separable approximation. IEEE Transactions on Signal Processing, 57(7): , [26] L. Xiao and T. Zhang. A proximal-gradient homotopy method for the sparse leastsquares problem. SIAM Journal on Optimization, 23(2): , [27] J. Yang and Y. Zhang. Alternating direction algorithms for l 1 -problems in compressive sensing. SIAM Journal on Scientific Computing, 33(1): ,

Large-Scale L1-Related Minimization in Compressive Sensing and Beyond

Large-Scale L1-Related Minimization in Compressive Sensing and Beyond Large-Scale L1-Related Minimization in Compressive Sensing and Beyond Yin Zhang Department of Computational and Applied Mathematics Rice University, Houston, Texas, U.S.A. Arizona State University March

More information

Optimization Algorithms for Compressed Sensing

Optimization Algorithms for Compressed Sensing Optimization Algorithms for Compressed Sensing Stephen Wright University of Wisconsin-Madison SIAM Gator Student Conference, Gainesville, March 2009 Stephen Wright (UW-Madison) Optimization and Compressed

More information

Inverse problems and sparse models (1/2) Rémi Gribonval INRIA Rennes - Bretagne Atlantique, France

Inverse problems and sparse models (1/2) Rémi Gribonval INRIA Rennes - Bretagne Atlantique, France Inverse problems and sparse models (1/2) Rémi Gribonval INRIA Rennes - Bretagne Atlantique, France remi.gribonval@inria.fr Structure of the tutorial Session 1: Introduction to inverse problems & sparse

More information

Sparsity in Underdetermined Systems

Sparsity in Underdetermined Systems Sparsity in Underdetermined Systems Department of Statistics Stanford University August 19, 2005 Classical Linear Regression Problem X n y p n 1 > Given predictors and response, y Xβ ε = + ε N( 0, σ 2

More information

Pre-weighted Matching Pursuit Algorithms for Sparse Recovery

Pre-weighted Matching Pursuit Algorithms for Sparse Recovery Journal of Information & Computational Science 11:9 (214) 2933 2939 June 1, 214 Available at http://www.joics.com Pre-weighted Matching Pursuit Algorithms for Sparse Recovery Jingfei He, Guiling Sun, Jie

More information

An Introduction to Sparse Approximation

An Introduction to Sparse Approximation An Introduction to Sparse Approximation Anna C. Gilbert Department of Mathematics University of Michigan Basic image/signal/data compression: transform coding Approximate signals sparsely Compress images,

More information

Iterative Reweighted Minimization Methods for l p Regularized Unconstrained Nonlinear Programming

Iterative Reweighted Minimization Methods for l p Regularized Unconstrained Nonlinear Programming Iterative Reweighted Minimization Methods for l p Regularized Unconstrained Nonlinear Programming Zhaosong Lu October 5, 2012 (Revised: June 3, 2013; September 17, 2013) Abstract In this paper we study

More information

SPARSE SIGNAL RESTORATION. 1. Introduction

SPARSE SIGNAL RESTORATION. 1. Introduction SPARSE SIGNAL RESTORATION IVAN W. SELESNICK 1. Introduction These notes describe an approach for the restoration of degraded signals using sparsity. This approach, which has become quite popular, is useful

More information

Iterative Hard Thresholding Methods for l 0 Regularized Convex Cone Programming

Iterative Hard Thresholding Methods for l 0 Regularized Convex Cone Programming Iterative Hard Thresholding Methods for l 0 Regularized Convex Cone Programming arxiv:1211.0056v2 [math.oc] 2 Nov 2012 Zhaosong Lu October 30, 2012 Abstract In this paper we consider l 0 regularized convex

More information

Minimizing the Difference of L 1 and L 2 Norms with Applications

Minimizing the Difference of L 1 and L 2 Norms with Applications 1/36 Minimizing the Difference of L 1 and L 2 Norms with Department of Mathematical Sciences University of Texas Dallas May 31, 2017 Partially supported by NSF DMS 1522786 2/36 Outline 1 A nonconvex approach:

More information

of Orthogonal Matching Pursuit

of Orthogonal Matching Pursuit A Sharp Restricted Isometry Constant Bound of Orthogonal Matching Pursuit Qun Mo arxiv:50.0708v [cs.it] 8 Jan 205 Abstract We shall show that if the restricted isometry constant (RIC) δ s+ (A) of the measurement

More information

COMPARATIVE ANALYSIS OF ORTHOGONAL MATCHING PURSUIT AND LEAST ANGLE REGRESSION

COMPARATIVE ANALYSIS OF ORTHOGONAL MATCHING PURSUIT AND LEAST ANGLE REGRESSION COMPARATIVE ANALYSIS OF ORTHOGONAL MATCHING PURSUIT AND LEAST ANGLE REGRESSION By Mazin Abdulrasool Hameed A THESIS Submitted to Michigan State University in partial fulfillment of the requirements for

More information

Uniqueness Conditions for A Class of l 0 -Minimization Problems

Uniqueness Conditions for A Class of l 0 -Minimization Problems Uniqueness Conditions for A Class of l 0 -Minimization Problems Chunlei Xu and Yun-Bin Zhao October, 03, Revised January 04 Abstract. We consider a class of l 0 -minimization problems, which is to search

More information

Necessary and Sufficient Conditions of Solution Uniqueness in 1-Norm Minimization

Necessary and Sufficient Conditions of Solution Uniqueness in 1-Norm Minimization Noname manuscript No. (will be inserted by the editor) Necessary and Sufficient Conditions of Solution Uniqueness in 1-Norm Minimization Hui Zhang Wotao Yin Lizhi Cheng Received: / Accepted: Abstract This

More information

A Bregman alternating direction method of multipliers for sparse probabilistic Boolean network problem

A Bregman alternating direction method of multipliers for sparse probabilistic Boolean network problem A Bregman alternating direction method of multipliers for sparse probabilistic Boolean network problem Kangkang Deng, Zheng Peng Abstract: The main task of genetic regulatory networks is to construct a

More information

1-Bit Compressive Sensing

1-Bit Compressive Sensing 1-Bit Compressive Sensing Petros T. Boufounos, Richard G. Baraniuk Rice University, Electrical and Computer Engineering 61 Main St. MS 38, Houston, TX 775 Abstract Compressive sensing is a new signal acquisition

More information

Generalized Orthogonal Matching Pursuit- A Review and Some

Generalized Orthogonal Matching Pursuit- A Review and Some Generalized Orthogonal Matching Pursuit- A Review and Some New Results Department of Electronics and Electrical Communication Engineering Indian Institute of Technology, Kharagpur, INDIA Table of Contents

More information

Introduction to Compressed Sensing

Introduction to Compressed Sensing Introduction to Compressed Sensing Alejandro Parada, Gonzalo Arce University of Delaware August 25, 2016 Motivation: Classical Sampling 1 Motivation: Classical Sampling Issues Some applications Radar Spectral

More information

A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization

A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization Panos Parpas Department of Computing Imperial College London www.doc.ic.ac.uk/ pp500 p.parpas@imperial.ac.uk jointly with D.V.

More information

Basis Pursuit Denoising and the Dantzig Selector

Basis Pursuit Denoising and the Dantzig Selector BPDN and DS p. 1/16 Basis Pursuit Denoising and the Dantzig Selector West Coast Optimization Meeting University of Washington Seattle, WA, April 28 29, 2007 Michael Friedlander and Michael Saunders Dept

More information

Compressed Sensing and Sparse Recovery

Compressed Sensing and Sparse Recovery ELE 538B: Sparsity, Structure and Inference Compressed Sensing and Sparse Recovery Yuxin Chen Princeton University, Spring 217 Outline Restricted isometry property (RIP) A RIPless theory Compressed sensing

More information

Gradient Descent with Sparsification: An iterative algorithm for sparse recovery with restricted isometry property

Gradient Descent with Sparsification: An iterative algorithm for sparse recovery with restricted isometry property : An iterative algorithm for sparse recovery with restricted isometry property Rahul Garg grahul@us.ibm.com Rohit Khandekar rohitk@us.ibm.com IBM T. J. Watson Research Center, 0 Kitchawan Road, Route 34,

More information

About Split Proximal Algorithms for the Q-Lasso

About Split Proximal Algorithms for the Q-Lasso Thai Journal of Mathematics Volume 5 (207) Number : 7 http://thaijmath.in.cmu.ac.th ISSN 686-0209 About Split Proximal Algorithms for the Q-Lasso Abdellatif Moudafi Aix Marseille Université, CNRS-L.S.I.S

More information

Sparse signals recovered by non-convex penalty in quasi-linear systems

Sparse signals recovered by non-convex penalty in quasi-linear systems Cui et al. Journal of Inequalities and Applications 018) 018:59 https://doi.org/10.1186/s13660-018-165-8 R E S E A R C H Open Access Sparse signals recovered by non-conve penalty in quasi-linear systems

More information

The Sparsest Solution of Underdetermined Linear System by l q minimization for 0 < q 1

The Sparsest Solution of Underdetermined Linear System by l q minimization for 0 < q 1 The Sparsest Solution of Underdetermined Linear System by l q minimization for 0 < q 1 Simon Foucart Department of Mathematics Vanderbilt University Nashville, TN 3784. Ming-Jun Lai Department of Mathematics,

More information

Introduction to Sparsity. Xudong Cao, Jake Dreamtree & Jerry 04/05/2012

Introduction to Sparsity. Xudong Cao, Jake Dreamtree & Jerry 04/05/2012 Introduction to Sparsity Xudong Cao, Jake Dreamtree & Jerry 04/05/2012 Outline Understanding Sparsity Total variation Compressed sensing(definition) Exact recovery with sparse prior(l 0 ) l 1 relaxation

More information

Inverse problems and sparse models (6/6) Rémi Gribonval INRIA Rennes - Bretagne Atlantique, France.

Inverse problems and sparse models (6/6) Rémi Gribonval INRIA Rennes - Bretagne Atlantique, France. Inverse problems and sparse models (6/6) Rémi Gribonval INRIA Rennes - Bretagne Atlantique, France remi.gribonval@inria.fr Overview of the course Introduction sparsity & data compression inverse problems

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

New Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit

New Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit New Coherence and RIP Analysis for Wea 1 Orthogonal Matching Pursuit Mingrui Yang, Member, IEEE, and Fran de Hoog arxiv:1405.3354v1 [cs.it] 14 May 2014 Abstract In this paper we define a new coherence

More information

Bayesian Methods for Sparse Signal Recovery

Bayesian Methods for Sparse Signal Recovery Bayesian Methods for Sparse Signal Recovery Bhaskar D Rao 1 University of California, San Diego 1 Thanks to David Wipf, Jason Palmer, Zhilin Zhang and Ritwik Giri Motivation Motivation Sparse Signal Recovery

More information

A tutorial on sparse modeling. Outline:

A tutorial on sparse modeling. Outline: A tutorial on sparse modeling. Outline: 1. Why? 2. What? 3. How. 4. no really, why? Sparse modeling is a component in many state of the art signal processing and machine learning tasks. image processing

More information

Optimization methods

Optimization methods Optimization methods Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda /8/016 Introduction Aim: Overview of optimization methods that Tend to

More information

Enhanced Compressive Sensing and More

Enhanced Compressive Sensing and More Enhanced Compressive Sensing and More Yin Zhang Department of Computational and Applied Mathematics Rice University, Houston, Texas, U.S.A. Nonlinear Approximation Techniques Using L1 Texas A & M University

More information

Necessary and sufficient conditions of solution uniqueness in l 1 minimization

Necessary and sufficient conditions of solution uniqueness in l 1 minimization 1 Necessary and sufficient conditions of solution uniqueness in l 1 minimization Hui Zhang, Wotao Yin, and Lizhi Cheng arxiv:1209.0652v2 [cs.it] 18 Sep 2012 Abstract This paper shows that the solutions

More information

Orthogonal Matching Pursuit for Sparse Signal Recovery With Noise

Orthogonal Matching Pursuit for Sparse Signal Recovery With Noise Orthogonal Matching Pursuit for Sparse Signal Recovery With Noise The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation As Published

More information

Noisy Signal Recovery via Iterative Reweighted L1-Minimization

Noisy Signal Recovery via Iterative Reweighted L1-Minimization Noisy Signal Recovery via Iterative Reweighted L1-Minimization Deanna Needell UC Davis / Stanford University Asilomar SSC, November 2009 Problem Background Setup 1 Suppose x is an unknown signal in R d.

More information

Elaine T. Hale, Wotao Yin, Yin Zhang

Elaine T. Hale, Wotao Yin, Yin Zhang , Wotao Yin, Yin Zhang Department of Computational and Applied Mathematics Rice University McMaster University, ICCOPT II-MOPTA 2007 August 13, 2007 1 with Noise 2 3 4 1 with Noise 2 3 4 1 with Noise 2

More information

EUSIPCO

EUSIPCO EUSIPCO 013 1569746769 SUBSET PURSUIT FOR ANALYSIS DICTIONARY LEARNING Ye Zhang 1,, Haolong Wang 1, Tenglong Yu 1, Wenwu Wang 1 Department of Electronic and Information Engineering, Nanchang University,

More information

Machine Learning for Signal Processing Sparse and Overcomplete Representations

Machine Learning for Signal Processing Sparse and Overcomplete Representations Machine Learning for Signal Processing Sparse and Overcomplete Representations Abelino Jimenez (slides from Bhiksha Raj and Sourish Chaudhuri) Oct 1, 217 1 So far Weights Data Basis Data Independent ICA

More information

Learning with stochastic proximal gradient

Learning with stochastic proximal gradient Learning with stochastic proximal gradient Lorenzo Rosasco DIBRIS, Università di Genova Via Dodecaneso, 35 16146 Genova, Italy lrosasco@mit.edu Silvia Villa, Băng Công Vũ Laboratory for Computational and

More information

Sparse Solutions of Systems of Equations and Sparse Modelling of Signals and Images

Sparse Solutions of Systems of Equations and Sparse Modelling of Signals and Images Sparse Solutions of Systems of Equations and Sparse Modelling of Signals and Images Alfredo Nava-Tudela ant@umd.edu John J. Benedetto Department of Mathematics jjb@umd.edu Abstract In this project we are

More information

Robust Sparse Recovery via Non-Convex Optimization

Robust Sparse Recovery via Non-Convex Optimization Robust Sparse Recovery via Non-Convex Optimization Laming Chen and Yuantao Gu Department of Electronic Engineering, Tsinghua University Homepage: http://gu.ee.tsinghua.edu.cn/ Email: gyt@tsinghua.edu.cn

More information

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44 Convex Optimization Newton s method ENSAE: Optimisation 1/44 Unconstrained minimization minimize f(x) f convex, twice continuously differentiable (hence dom f open) we assume optimal value p = inf x f(x)

More information

An Homotopy Algorithm for the Lasso with Online Observations

An Homotopy Algorithm for the Lasso with Online Observations An Homotopy Algorithm for the Lasso with Online Observations Pierre J. Garrigues Department of EECS Redwood Center for Theoretical Neuroscience University of California Berkeley, CA 94720 garrigue@eecs.berkeley.edu

More information

Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems)

Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems) Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems) Donghwan Kim and Jeffrey A. Fessler EECS Department, University of Michigan

More information

Motivation Sparse Signal Recovery is an interesting area with many potential applications. Methods developed for solving sparse signal recovery proble

Motivation Sparse Signal Recovery is an interesting area with many potential applications. Methods developed for solving sparse signal recovery proble Bayesian Methods for Sparse Signal Recovery Bhaskar D Rao 1 University of California, San Diego 1 Thanks to David Wipf, Zhilin Zhang and Ritwik Giri Motivation Sparse Signal Recovery is an interesting

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

A Randomized Nonmonotone Block Proximal Gradient Method for a Class of Structured Nonlinear Programming

A Randomized Nonmonotone Block Proximal Gradient Method for a Class of Structured Nonlinear Programming A Randomized Nonmonotone Block Proximal Gradient Method for a Class of Structured Nonlinear Programming Zhaosong Lu Lin Xiao March 9, 2015 (Revised: May 13, 2016; December 30, 2016) Abstract We propose

More information

On the Minimization Over Sparse Symmetric Sets: Projections, O. Projections, Optimality Conditions and Algorithms

On the Minimization Over Sparse Symmetric Sets: Projections, O. Projections, Optimality Conditions and Algorithms On the Minimization Over Sparse Symmetric Sets: Projections, Optimality Conditions and Algorithms Amir Beck Technion - Israel Institute of Technology Haifa, Israel Based on joint work with Nadav Hallak

More information

OWL to the rescue of LASSO

OWL to the rescue of LASSO OWL to the rescue of LASSO IISc IBM day 2018 Joint Work R. Sankaran and Francis Bach AISTATS 17 Chiranjib Bhattacharyya Professor, Department of Computer Science and Automation Indian Institute of Science,

More information

Sparse Approximation via Penalty Decomposition Methods

Sparse Approximation via Penalty Decomposition Methods Sparse Approximation via Penalty Decomposition Methods Zhaosong Lu Yong Zhang February 19, 2012 Abstract In this paper we consider sparse approximation problems, that is, general l 0 minimization problems

More information

Randomized Block Coordinate Non-Monotone Gradient Method for a Class of Nonlinear Programming

Randomized Block Coordinate Non-Monotone Gradient Method for a Class of Nonlinear Programming Randomized Block Coordinate Non-Monotone Gradient Method for a Class of Nonlinear Programming Zhaosong Lu Lin Xiao June 25, 2013 Abstract In this paper we propose a randomized block coordinate non-monotone

More information

EE 381V: Large Scale Optimization Fall Lecture 24 April 11

EE 381V: Large Scale Optimization Fall Lecture 24 April 11 EE 381V: Large Scale Optimization Fall 2012 Lecture 24 April 11 Lecturer: Caramanis & Sanghavi Scribe: Tao Huang 24.1 Review In past classes, we studied the problem of sparsity. Sparsity problem is that

More information

MIST: l 0 Sparse Linear Regression with Momentum

MIST: l 0 Sparse Linear Regression with Momentum MIST: l 0 Sparse Linear Regression with Momentum Goran Marjanovic, Magnus O. Ulfarsson, Alfred O. Hero III arxiv:409.793v [stat.ml] 9 Mar 05 Abstract Significant attention has been given to minimizing

More information

Sparse Solutions of an Undetermined Linear System

Sparse Solutions of an Undetermined Linear System 1 Sparse Solutions of an Undetermined Linear System Maddullah Almerdasy New York University Tandon School of Engineering arxiv:1702.07096v1 [math.oc] 23 Feb 2017 Abstract This work proposes a research

More information

Color Scheme. swright/pcmi/ M. Figueiredo and S. Wright () Inference and Optimization PCMI, July / 14

Color Scheme.   swright/pcmi/ M. Figueiredo and S. Wright () Inference and Optimization PCMI, July / 14 Color Scheme www.cs.wisc.edu/ swright/pcmi/ M. Figueiredo and S. Wright () Inference and Optimization PCMI, July 2016 1 / 14 Statistical Inference via Optimization Many problems in statistical inference

More information

SPARSE signal representations have gained popularity in recent

SPARSE signal representations have gained popularity in recent 6958 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 10, OCTOBER 2011 Blind Compressed Sensing Sivan Gleichman and Yonina C. Eldar, Senior Member, IEEE Abstract The fundamental principle underlying

More information

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 9 Alternating Direction Method of Multipliers Shiqian Ma, MAT-258A: Numerical Optimization 2 Separable convex optimization a special case is min f(x)

More information

Adaptive Primal Dual Optimization for Image Processing and Learning

Adaptive Primal Dual Optimization for Image Processing and Learning Adaptive Primal Dual Optimization for Image Processing and Learning Tom Goldstein Rice University tag7@rice.edu Ernie Esser University of British Columbia eesser@eos.ubc.ca Richard Baraniuk Rice University

More information

Block Coordinate Descent for Regularized Multi-convex Optimization

Block Coordinate Descent for Regularized Multi-convex Optimization Block Coordinate Descent for Regularized Multi-convex Optimization Yangyang Xu and Wotao Yin CAAM Department, Rice University February 15, 2013 Multi-convex optimization Model definition Applications Outline

More information

Lecture: Introduction to Compressed Sensing Sparse Recovery Guarantees

Lecture: Introduction to Compressed Sensing Sparse Recovery Guarantees Lecture: Introduction to Compressed Sensing Sparse Recovery Guarantees http://bicmr.pku.edu.cn/~wenzw/bigdata2018.html Acknowledgement: this slides is based on Prof. Emmanuel Candes and Prof. Wotao Yin

More information

Compressed sensing. Or: the equation Ax = b, revisited. Terence Tao. Mahler Lecture Series. University of California, Los Angeles

Compressed sensing. Or: the equation Ax = b, revisited. Terence Tao. Mahler Lecture Series. University of California, Los Angeles Or: the equation Ax = b, revisited University of California, Los Angeles Mahler Lecture Series Acquiring signals Many types of real-world signals (e.g. sound, images, video) can be viewed as an n-dimensional

More information

A Brief Overview of Practical Optimization Algorithms in the Context of Relaxation

A Brief Overview of Practical Optimization Algorithms in the Context of Relaxation A Brief Overview of Practical Optimization Algorithms in the Context of Relaxation Zhouchen Lin Peking University April 22, 2018 Too Many Opt. Problems! Too Many Opt. Algorithms! Zero-th order algorithms:

More information

Sparse Solutions of Linear Systems of Equations and Sparse Modeling of Signals and Images!

Sparse Solutions of Linear Systems of Equations and Sparse Modeling of Signals and Images! Sparse Solutions of Linear Systems of Equations and Sparse Modeling of Signals and Images! Alfredo Nava-Tudela John J. Benedetto, advisor 1 Happy birthday Lucía! 2 Outline - Problem: Find sparse solutions

More information

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique Master 2 MathBigData S. Gaïffas 1 3 novembre 2014 1 CMAP - Ecole Polytechnique 1 Supervised learning recap Introduction Loss functions, linearity 2 Penalization Introduction Ridge Sparsity Lasso 3 Some

More information

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization /

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization / Coordinate descent Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Adding to the toolbox, with stats and ML in mind We ve seen several general and useful minimization tools First-order methods

More information

Analysis of Greedy Algorithms

Analysis of Greedy Algorithms Analysis of Greedy Algorithms Jiahui Shen Florida State University Oct.26th Outline Introduction Regularity condition Analysis on orthogonal matching pursuit Analysis on forward-backward greedy algorithm

More information

A New Estimate of Restricted Isometry Constants for Sparse Solutions

A New Estimate of Restricted Isometry Constants for Sparse Solutions A New Estimate of Restricted Isometry Constants for Sparse Solutions Ming-Jun Lai and Louis Y. Liu January 12, 211 Abstract We show that as long as the restricted isometry constant δ 2k < 1/2, there exist

More information

Convex Optimization and l 1 -minimization

Convex Optimization and l 1 -minimization Convex Optimization and l 1 -minimization Sangwoon Yun Computational Sciences Korea Institute for Advanced Study December 11, 2009 2009 NIMS Thematic Winter School Outline I. Convex Optimization II. l

More information

Lecture 23: November 21

Lecture 23: November 21 10-725/36-725: Convex Optimization Fall 2016 Lecturer: Ryan Tibshirani Lecture 23: November 21 Scribes: Yifan Sun, Ananya Kumar, Xin Lu Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:

More information

Sparsity Regularization

Sparsity Regularization Sparsity Regularization Bangti Jin Course Inverse Problems & Imaging 1 / 41 Outline 1 Motivation: sparsity? 2 Mathematical preliminaries 3 l 1 solvers 2 / 41 problem setup finite-dimensional formulation

More information

Multiple Change Point Detection by Sparse Parameter Estimation

Multiple Change Point Detection by Sparse Parameter Estimation Multiple Change Point Detection by Sparse Parameter Estimation Department of Econometrics Fac. of Economics and Management University of Defence Brno, Czech Republic Dept. of Appl. Math. and Comp. Sci.

More information

2 Regularized Image Reconstruction for Compressive Imaging and Beyond

2 Regularized Image Reconstruction for Compressive Imaging and Beyond EE 367 / CS 448I Computational Imaging and Display Notes: Compressive Imaging and Regularized Image Reconstruction (lecture ) Gordon Wetzstein gordon.wetzstein@stanford.edu This document serves as a supplement

More information

Thresholds for the Recovery of Sparse Solutions via L1 Minimization

Thresholds for the Recovery of Sparse Solutions via L1 Minimization Thresholds for the Recovery of Sparse Solutions via L Minimization David L. Donoho Department of Statistics Stanford University 39 Serra Mall, Sequoia Hall Stanford, CA 9435-465 Email: donoho@stanford.edu

More information

Convex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013

Convex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013 Convex Optimization (EE227A: UC Berkeley) Lecture 15 (Gradient methods III) 12 March, 2013 Suvrit Sra Optimal gradient methods 2 / 27 Optimal gradient methods We saw following efficiency estimates for

More information

Gauge optimization and duality

Gauge optimization and duality 1 / 54 Gauge optimization and duality Junfeng Yang Department of Mathematics Nanjing University Joint with Shiqian Ma, CUHK September, 2015 2 / 54 Outline Introduction Duality Lagrange duality Fenchel

More information

Lecture 25: November 27

Lecture 25: November 27 10-725: Optimization Fall 2012 Lecture 25: November 27 Lecturer: Ryan Tibshirani Scribes: Matt Wytock, Supreeth Achar Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have

More information

ABSTRACT. Recovering Data with Group Sparsity by Alternating Direction Methods. Wei Deng

ABSTRACT. Recovering Data with Group Sparsity by Alternating Direction Methods. Wei Deng ABSTRACT Recovering Data with Group Sparsity by Alternating Direction Methods by Wei Deng Group sparsity reveals underlying sparsity patterns and contains rich structural information in data. Hence, exploiting

More information

Exact penalty decomposition method for zero-norm minimization based on MPEC formulation 1

Exact penalty decomposition method for zero-norm minimization based on MPEC formulation 1 Exact penalty decomposition method for zero-norm minimization based on MPEC formulation Shujun Bi, Xiaolan Liu and Shaohua Pan November, 2 (First revised July 5, 22) (Second revised March 2, 23) (Final

More information

Reconstruction of Block-Sparse Signals by Using an l 2/p -Regularized Least-Squares Algorithm

Reconstruction of Block-Sparse Signals by Using an l 2/p -Regularized Least-Squares Algorithm Reconstruction of Block-Sparse Signals by Using an l 2/p -Regularized Least-Squares Algorithm Jeevan K. Pant, Wu-Sheng Lu, and Andreas Antoniou University of Victoria May 21, 2012 Compressive Sensing 1/23

More information

Distributed Inexact Newton-type Pursuit for Non-convex Sparse Learning

Distributed Inexact Newton-type Pursuit for Non-convex Sparse Learning Distributed Inexact Newton-type Pursuit for Non-convex Sparse Learning Bo Liu Department of Computer Science, Rutgers Univeristy Xiao-Tong Yuan BDAT Lab, Nanjing University of Information Science and Technology

More information

MATCHING PURSUIT WITH STOCHASTIC SELECTION

MATCHING PURSUIT WITH STOCHASTIC SELECTION 2th European Signal Processing Conference (EUSIPCO 22) Bucharest, Romania, August 27-3, 22 MATCHING PURSUIT WITH STOCHASTIC SELECTION Thomas Peel, Valentin Emiya, Liva Ralaivola Aix-Marseille Université

More information

A direct formulation for sparse PCA using semidefinite programming

A direct formulation for sparse PCA using semidefinite programming A direct formulation for sparse PCA using semidefinite programming A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley Available online at www.princeton.edu/~aspremon

More information

Optimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison

Optimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison Optimization Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison optimization () cost constraints might be too much to cover in 3 hours optimization (for big

More information

Sparse & Redundant Signal Representation, and its Role in Image Processing

Sparse & Redundant Signal Representation, and its Role in Image Processing Sparse & Redundant Signal Representation, and its Role in Michael Elad The CS Department The Technion Israel Institute of technology Haifa 3000, Israel Wave 006 Wavelet and Applications Ecole Polytechnique

More information

Accelerated primal-dual methods for linearly constrained convex problems

Accelerated primal-dual methods for linearly constrained convex problems Accelerated primal-dual methods for linearly constrained convex problems Yangyang Xu SIAM Conference on Optimization May 24, 2017 1 / 23 Accelerated proximal gradient For convex composite problem: minimize

More information

Recent Advances in Structured Sparse Models

Recent Advances in Structured Sparse Models Recent Advances in Structured Sparse Models Julien Mairal Willow group - INRIA - ENS - Paris 21 September 2010 LEAR seminar At Grenoble, September 21 st, 2010 Julien Mairal Recent Advances in Structured

More information

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization Frank-Wolfe Method Ryan Tibshirani Convex Optimization 10-725 Last time: ADMM For the problem min x,z f(x) + g(z) subject to Ax + Bz = c we form augmented Lagrangian (scaled form): L ρ (x, z, w) = f(x)

More information

Fast Hard Thresholding with Nesterov s Gradient Method

Fast Hard Thresholding with Nesterov s Gradient Method Fast Hard Thresholding with Nesterov s Gradient Method Volkan Cevher Idiap Research Institute Ecole Polytechnique Federale de ausanne volkan.cevher@epfl.ch Sina Jafarpour Department of Computer Science

More information

Block stochastic gradient update method

Block stochastic gradient update method Block stochastic gradient update method Yangyang Xu and Wotao Yin IMA, University of Minnesota Department of Mathematics, UCLA November 1, 2015 This work was done while in Rice University 1 / 26 Stochastic

More information

Tractable Upper Bounds on the Restricted Isometry Constant

Tractable Upper Bounds on the Restricted Isometry Constant Tractable Upper Bounds on the Restricted Isometry Constant Alex d Aspremont, Francis Bach, Laurent El Ghaoui Princeton University, École Normale Supérieure, U.C. Berkeley. Support from NSF, DHS and Google.

More information

MLCC 2018 Variable Selection and Sparsity. Lorenzo Rosasco UNIGE-MIT-IIT

MLCC 2018 Variable Selection and Sparsity. Lorenzo Rosasco UNIGE-MIT-IIT MLCC 2018 Variable Selection and Sparsity Lorenzo Rosasco UNIGE-MIT-IIT Outline Variable Selection Subset Selection Greedy Methods: (Orthogonal) Matching Pursuit Convex Relaxation: LASSO & Elastic Net

More information

SVRG++ with Non-uniform Sampling

SVRG++ with Non-uniform Sampling SVRG++ with Non-uniform Sampling Tamás Kern András György Department of Electrical and Electronic Engineering Imperial College London, London, UK, SW7 2BT {tamas.kern15,a.gyorgy}@imperial.ac.uk Abstract

More information

TRACKING SOLUTIONS OF TIME VARYING LINEAR INVERSE PROBLEMS

TRACKING SOLUTIONS OF TIME VARYING LINEAR INVERSE PROBLEMS TRACKING SOLUTIONS OF TIME VARYING LINEAR INVERSE PROBLEMS Martin Kleinsteuber and Simon Hawe Department of Electrical Engineering and Information Technology, Technische Universität München, München, Arcistraße

More information

Group Sparse Optimization via l p,q Regularization

Group Sparse Optimization via l p,q Regularization Journal of Machine Learning Research 8 (27) -52 Submitted 2/5; Revised 2/7; Published 4/7 Group Sparse Optimization via l p,q Regularization Yaohua Hu College of Mathematics and Statistics Shenzhen University

More information

Proximal Newton Method. Ryan Tibshirani Convex Optimization /36-725

Proximal Newton Method. Ryan Tibshirani Convex Optimization /36-725 Proximal Newton Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: primal-dual interior-point method Given the problem min x subject to f(x) h i (x) 0, i = 1,... m Ax = b where f, h

More information

Sparse Signal Reconstruction with Hierarchical Decomposition

Sparse Signal Reconstruction with Hierarchical Decomposition Sparse Signal Reconstruction with Hierarchical Decomposition Ming Zhong Advisor: Dr. Eitan Tadmor AMSC and CSCAMM University of Maryland College Park College Park, Maryland 20742 USA November 8, 2012 Abstract

More information

Machine Learning for Signal Processing Sparse and Overcomplete Representations. Bhiksha Raj (slides from Sourish Chaudhuri) Oct 22, 2013

Machine Learning for Signal Processing Sparse and Overcomplete Representations. Bhiksha Raj (slides from Sourish Chaudhuri) Oct 22, 2013 Machine Learning for Signal Processing Sparse and Overcomplete Representations Bhiksha Raj (slides from Sourish Chaudhuri) Oct 22, 2013 1 Key Topics in this Lecture Basics Component-based representations

More information

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16 XVI - 1 Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16 A slightly changed ADMM for convex optimization with three separable operators Bingsheng He Department of

More information

A Fast Augmented Lagrangian Algorithm for Learning Low-Rank Matrices

A Fast Augmented Lagrangian Algorithm for Learning Low-Rank Matrices A Fast Augmented Lagrangian Algorithm for Learning Low-Rank Matrices Ryota Tomioka 1, Taiji Suzuki 1, Masashi Sugiyama 2, Hisashi Kashima 1 1 The University of Tokyo 2 Tokyo Institute of Technology 2010-06-22

More information