MINIMIZATION OF TRANSFORMED L 1 PENALTY: THEORY, DIFFERENCE OF CONVEX FUNCTION ALGORITHM, AND ROBUST APPLICATION IN COMPRESSED SENSING

Size: px
Start display at page:

Download "MINIMIZATION OF TRANSFORMED L 1 PENALTY: THEORY, DIFFERENCE OF CONVEX FUNCTION ALGORITHM, AND ROBUST APPLICATION IN COMPRESSED SENSING"

Transcription

1 MINIMIZATION OF TRANSFORMED L PENALTY: THEORY, DIFFERENCE OF CONVEX FUNCTION ALGORITHM, AND ROBUST APPLICATION IN COMPRESSED SENSING SHUAI ZHANG, AND JACK XIN Abstract. We study the minimization problem of a non-convex sparsity promoting penalty function, the transformed l (TL), and its application in compressed sensing (CS). The TL penalty interpolates l and l norms through a nonnegative parameter a (, + ), similar to l p with p (, ], and is known to satisfy unbiasedness, sparsity and Lipschitz continuity properties. We first consider the constrained minimization problem, and discuss the exact recovery of l norm minimal solution based on the null space property (NSP). We then prove the stable recovery of l norm minimal solution if the sensing matrix A satisfies a restricted isometry property (RIP). We formulated a normalized problem to overcome the lack of scaling property of the TL penalty function. For a general sensing matrix A, we show that the support set of a local minimizer corresponds to linearly independent columns of A. Next, we present difference of convex algorithms for TL (DCATL) in computing TL-regularized constrained and unconstrained problems in CS. The DCATL algorithm involves outer and inner loops of iterations, one time matrix inversion, repeated shrinkage operations and matrix-vector multiplications. The inner loop concerns an l minimization problem on which we employ the Alternating Direction Method of Multipliers (ADMM). For the unconstrained problem, we prove convergence of DCATL to a stationary point satisfying the first order optimality condition. In numerical experiments, we identify the optimal value a =, and compare DCATL with other CS algorithms on two classes of sensing matrices: Gaussian random matrices and over-sampled discrete cosine transform matrices (DCT). Among existing algorithms, the iterated reweighted least squares method based on l /2 norm is the best in sparse recovery for Gaussian matrices, and the DCA algorithm based on l minus l 2 penalty is the best for over-sampled DCT matrices. We find that for both classes of sensing matrices, the performance of DCATL algorithm (initiated with l minimization) always ranks near the top (if not the top), and is the most robust choice insensitive to the conditioning of the sensing matrix A. DCATL is also competitive in comparison with DCA on other non-convex penalty functions commonly used in statistics with two hyperparameters. Keywords: Transformed l penalty, sparse signal recovery theory, difference of convex function algorithm, convergence analysis, coherent random matrices, compressed sensing, robust recovery. AMS subject classifications. 9C26, 65K, 9C9. Introduction Compressed sensing [6, 2] has generated enormous interest and research activities in mathematics, statistics, information science and elsewhere. A basic problem is to reconstruct a sparse signal under a few linear measurements (linear constraints) far less than the dimension of the ambient space of the signal. Consider a sparse signal x R N, an M N sensing matrix A and an observation y R M, M N, such that: y = A x + ɛ, where ɛ is an N-dimensional observation error. The main objective is to recover x from y. The direct approach is l optimization in either a constrained formulation: or an unconstrained l regularized optimization: min x, s.t. Ax = y, (.) x R N min { y A x 2 x R N 2 + λ x } (.2) S. Zhang and J. Xin were partially supported by NSF grants DMS , DMS-22257, and DMS They are with the Department of Mathematics, University of California, Irvine, CA, 92697, USA. szhang3@uci.edu; jxin@math.uci.edu.

2 with a positive regularization parameter λ. Since minimizing l norm is NP-hard [28], many viable alternatives have been sought. Greedy methods (matching pursuit [26], othogonal matching pursuits (OMP) [38], and regularized OMP (ROMP) [29]) work well if the dimension N is not too large. For the unconstrained problem (.2), the penalty decomposition method [24] replaces the term λ x by ρ k x z λ z, and minimizes over (x, z) for a diverging sequence ρ k. The variable z allows the iterative hard thresholding procedure. The relaxation approach is to replace l norm by a continuous sparsity promoting penalty function P (x). Convex relaxation uniquely selects P ( ) as the l norm. The resulting problem is known as basis pursuit (LASSO in the over-determined regime [36]). The l algorithms include l -magic [6], Bregman and split Bregman methods [7, 44] and yall [4]. Theoretically, Candès, Tao and coauthors introduced the restricted isometry property (RIP) on A to establish the equivalent and unique global solution to l minimization and stable sparse recovery results [3, 4, 6]. There are many choices of P ( ) for non-convex relaxation, almost all of them are known in statistics [5, 27]. One is the l p norm (a.k.a. bridge penalty) (p (, )) with known l equivalence under RIP [9]. The l /2 norm is representative of this class of penlty functions, with the associated reweighted least squares and half-thresholding algorithms for computation [8, 4, 39]. Near the RIP regime, l /2 penalty tends to have higher success rate of sparse reconstruction than l. However, it is not as good as l if the sensing matrix A is far from RIP [23, 42] as we shall see later as well. In the highly non-rip (coherent) regime, it is recently found that the difference of l and l 2 norm minimization gives the best sparse recovery results [42, 23]. It is therefore of both theoretical and practical interest to find a non-convex penalty that is consistently better than l and always ranks among the top in sparse recovery whether the sensing matrix satisfies RIP or not. The l p penalty functions are however known to bias towards large peaks. In the statistics literature of variable selection, Fan and Li [5] advocated for classes of penalty functions with three desired properties: unbiasedness, sparsity and continuity. To help identify such a penalty function denoted by ρ( ), Fan and Lv [25] proposed the following condition for characterizing unbiasedness and sparsity promoting properties. Condition. The penalty function ρ( ) satisfies: (i) ρ(t) is increasing and concave in t [, ), (ii) ρ (t) is continuous with ρ (+) (, ). It follows that ρ (t) is positive and decreasing, and ρ (+) is the upper bound of ρ (t). The penalties satisfying Condition and lim t ρ (t) = enjoy both unbiasedness and sparsity [25]. Though continuity does not generally hold for this class of penalty functions, a special one parameter family of Lipschitz continuous functions, the so called transformed l functions [3], satisfy all three desired properties above [25]. In this paper, we show that minimizing the non-convex transformed l functions (TL) by the difference of convex (DC) function algorithm provides a robust CS solution insensitive to the conditioning of A. Since verifying the incoherence condition like RIP or null space property [] on a specific matrix is NP hard, such robustness is a significant attribute of an algorithm. Let us consider TL function of the form [25]: (a + ) t ρ a (t) =, a + t t R, (.3) 2

3 with parameter a (, + ), see [3, 9] for alternative forms and the l approximation property [9]. Another nice property of TL is that the TL proximal operator has closed form analytical solutions for all values of parameter a. Fast TL iterative theresholding algorithms have been devised and studied for both sparse vector and low rank matrix recovery problems lately [47, 48]. The rest of the paper is organized as follows. In section 2, we study theoretical properties of TL penalty and TL regularized models in exact and stable recovery of l minimal solutions. We show the advantage of TL over l in exact recovery under linear constraint using the generalized null space property [37] and explicit examples. We analyze stable recovery for linear constraint with observation error based on a RIP condition. We overcome the lack of scaling property of TL by introducing a normalized TL regularized problem. Though the RIP condition is not sharp, our analysis is the first of such kind for stable recovery by TL. We also prove that the local minimizers of the TL constrained model extract independent columns from the sensing matrix. In section 3, we present two DC algorithms for TL optimization (DCATL). In section 4, we compare the performance of DCATL with some state-of-the-art methods using two classes of matrices: the Gaussian and the over-sampled discrete cosine transform (DCT) matrices. Numerical experiments indicate that DCATL is robust and consistently top ranked while maintaining high sparse recovery rates across sensing matrices of varying coherence. In section 5, we compare DCATL with DCA on other non-convex penalties such as PiE [3], MCP [46], and SCAD [5], and found DCATL to be competitive as well. Concluding remarks are in section Transformed l (TL) and its regularization models The TL penalty function ρ a (t) of (.3) interpolates the l and l norms as lim ρ a(t) = χ {t } and lim ρ a(t) = t. a + a In Fig. (2.), we compare level lines of l and TL with different parameter values of a. With the adjustment of parameter a, the TL can approximate both l and l well. Let us define TL regularization term P a ( ) as P a (x) = ρ a (x i ). (2.) i=,...n In the following, we consider the constrained TL minimization model and the unconstrained TL-regularized model min f(x) = min P a (x) s.t. Ax = y, (2.2) x R N x R N min f(x) = min x R N x R N 2 Ax y λp a (x). (2.3) The following inequalities of ρ a will be used in the proof of TL theories. Lemma 2.. For a, any x i and x j in R, the following inequalities hold: ρ a ( x i + x j ) ρ a ( x i + x j ) ρ a ( x i ) + ρ a ( x j ) 2ρ a ( x i + x j ). (2.4) 2 Proof. Let us prove these inequalities one by one, starting from the left. 3

4 (a) l (b) TL with a = (c) TL with a = (d) TL with a = Fig. 2.: Level lines of TL with different parameters: a = (figure b), a = (figure c), a =. (figure d). For large parameter a, the graph looks almost the same as l (figure a). While for small value of a, it tends to the axis. ) Note that ρ a ( t ) is increasing in the variable t. By triangle inequality x i + x j x i + x j, we have: 2) ρ a ( x i + x j ) ρ a ( x i + x j ). ρ a ( x i ) + ρ a ( x j ) = (a + ) x i a + x i + (a + ) x j a + x j = a(a + )( x i + x j + 2 x i x j /a) a(a + x i + x j + x i x j /a) (a + )( x i + x j + x i x j /a) (a + x i + x j + x i x j /a) = ρ a ( x i + x j + x i x j /a) ρ a ( x i + x j ). 4

5 3) By concavity of the function ρ a ( ), ρ a ( x i ) + ρ a ( x j ) 2 ( ) xi + x j ρ a. 2 Remark 2.. It follows from Lemma 2. that the triangular inequality holds for the function ρ(x) ρ a ( x ): ρ(x i + x j ) = ρ a ( x i + x j ) ρ a ( x i ) + ρ a ( x j ) = ρ(x i ) + ρ(x j ). Also we have: ρ(x), and ρ(x) = x =. Our penalty function ρ acts almost like a norm. However, it lacks absolute scalability, or ρ(cx) c ρ(x) in general. The next lemma further analyzes this in terms of inequalities. Lemma 2.2. For scalar t R, ρ a ( ct ) = { c ρa ( t ) if c > ; c ρ a ( t ) if c. (2.5) Proof. ρ a ( ct ) = (a + ) c t a + c t = c ρ a ( t ) a + t a + ct. So if c, the factor a+ t a+ ct. Then ρ a( ct ) c ρ a ( t ). Similarly when c >, we have ρ a ( ct ) c ρ a ( t ). 2.. Exact and Stable Sparse Recovery for Constrained Model For the constrained TL model (2.2), we discuss the theoretical issue of exact and stable recovery of l minimal solution. Specifically, let x = β be the unique sparsest solution of Ax = y with s nonzero components, we address whether it is possible to construct it by minimizing P a. Let A be an M N matrix, T {,..., N}, A T the matrix consisting of the columns a j of A, j T. Similarly for vector x, x T is a sub-vector consisting of components with indices in T. Let vector β be: β = arg min x R N {P a (x) Ax = y}. (2.6) The necessary and sufficient condition of exact recovery, namely β = β, is the generalized null space property (gnsp, [37]): Ker(A)\{} gns := {v R N : P a (v T ) < P a (v T c), T, T s}, (2.7) where T is the cardinality (the number of elements) of the set T, and T c is the complement of T. The gnsp generalizes the well-known NSP for l exact recovery []: Ker(A)\{} NS := {v R N : v T < v T c, T, T s}. (2.8) For the class of separable, concave and symmetric penalties [37] including l p, TL, capped l ( i min ( x i, θ), θ > ), SCAD, PiE, and MCP (see Table 5.), [37] proved that gnsp is the necessary and sufficient condition for exact sparse recovery while being no more restrictive than NSP. In fact, the inclusion NS gns holds for this class of 5

6 penalties (Proposition 3.3, [37]). It follows that if exact recovery holds for a matrix A by l, it is also true for any of these concave penalties. By a scaling argument, we show that: Theorem 2.. NSP of l is equivalent to gnsp of SCAD or capped l. In other words, minimizing non-convex penalties SCAD and capped l has no gain over minimizing l in the exact sparse recovery problem. Proof. Consider any v Ker(A)\{} satisfying gnsp (2.7). Let ɛ (, ) be less than θ/ v (β/ v ) in case of capped-l (SCAD). Then ɛ v Ker(A)\{} also satisfies gnsp, and: which is same as: P a (ɛ v T ) < P a (ɛ v T c), T, T s, ɛ v T < ɛv T c, T, T s, implying that v satisfies NSP. Hence gnsp = NSP for SCAD and capped-l. We give an example of a square matrix below to show that the inclusion NS gns is strict for TL, l p, PiE, MCP, implying that the exact recovery by any one of these four penalties holds but that of l (also SCAD, capped l ) fails. Consider: A = 2 2 with Ker(A) = span{(, 2, ) }. The linear constraint is A x = (,, ). The sparsest solution is β = (,, ), s =. The set T has cardinality. If T = {2}, for any nonzero vector t (, 2, ) in Ker(A), t, v T = 2 t, v T c = t (, ). So v T = 2 t = v T c, NSP fails. To verify gnsp at T = {2} for TL, we have: P a (v T ) = (a + )2 t /(a + 2 t ) < 2(a + ) t /(a + t ) = P a (v T c). At T = {}, v T = t, v T c = t[ 2, ], we have: P a (v T ) = (a + ) t /(a + t ) < P a (v T c) = (a + )2 t /(a + 2 t ) + (a + ) t /(a + t ). The case T = {3} is the same, and gnsp holds for TL. A similar verification on the validity of gnsp can be done for l p, PiE and MCP. It suffices to check T = {2} where the largest component of the null vector (in absolute value) is. For l p and PiE, P a ( ) is strictly concave on R +, Jensen s inequality gives P a (v T ) = P a () + P a (2 t ) < 2P a ( t ) = P a (v T c). For MCP, P a is quadratic and strictly concave on [, αβ], hence P a (v T ) < P a (v T c) if 2 t αβ. If t < αβ < 2 t, the line connecting (, ) and (2 t, P a (2 t )) is still strictly below the P a curve, hence P a (v T ) < P a (v T c) holds. If t αβ, P a (v T ) = P a (2 t ) = αβ 2 < 2αβ 2 = 2P a ( t ) = P a (v T c). The example can be extended to a block diagonal M M matrix (M > 3) of the form diag(a, B), where B is any invertible M 3 square matrix, with the right hand side vector of the linear constraint being (,,,,, ). The following is a rectangular 3 4 matrix (in the class of fat sensing matrices of CS) where NSP fails and gnsp prevails: A f = 2 2 6

7 for the linear constraint A f x = (,, ), x R 4. The sparsest solution is β = (,,, ), s =. The Ker(A f ) = span{(, 2,, ) }, rank(a f ) = 3. Since the last component of any null vector is zero, NSP fails at T = {2} as in the 3 3 example while gnsp inequalities remain valid at T {4}. Clearly, the gnsp inequality holds at T = {4}. To summarize, we state: Remark 2.2. The set of matrices satisfying gnsp of concave penalties (l p, TL, PiE, MCP) can be larger than that of NSP. Since matrices satisfying NSP or gnsp tend to be incoherent, we expect that the exact recovery by (l p, TL, PiE, MCP) is better than (l, capped l, SCAD) in this regime. This phenomenon is partly observed in our numerical experiments later. Checking NSP or gnsp is NP hard in general. The restricted isometry property (RIP) provides a sufficient condition for l exact recovery or NSP, and is satisfied with overwhelming probability by Gaussian random matrices with i.i.d. entries [4]. By the inclusion relation NS gns, the minimization on any one of the concave penalties above in the setting of (2.6) recovers exactly the minimal l solution β for Gaussian random matrices with i.i.d. entries with overwhelming probability. Though gnsp (2.7) is sharp for exact recovery, it is only applicable for precise measurement or when the linear constraint holds exactly. If there is any measurement error, can one recover the l solution up to certain tolerance (error bound)? To answer such stable recovery question for TL, we carry out a RIP analysis below to show that a stable recovery of β is possible based on a normalized TL minimization problem. Naturally, RIP analysis also gives an exact recovery result when the measurement is error free. Though sub-optimal, it is the first step towards a stable recovery theory. To begin, we recall: Definition 2.. (Restricted Isometry Constant) For each number s, define the s- restricted isometry constant of matrix A as the smallest number δ s (, ) such that for all column index subset T with T s and all x R T, the inequality ( δ s ) x 2 2 A T x 2 2 ( + δ s ) x 2 2 holds. The matrix A is said to satisfy the s-rip with δ s. Due to lack of scaling property of TL, we introduce a normalization procedure to recover β. For a fixed y, the under-determined linear constraint has infinitely many solutions. Let x be a solution of Ax = y, not necessarily the l or ρ a minimizer. If P a (x ) >, we scale y by a positive scalar C as: y C = y C ; x C = x C. (2.9) Now x C is a solution to the equivalent scaled constraint: Ax C = y C. When C becomes larger, the number P a (x C ) is smaller and tends to in the limit C. Thus, we can find a constant C, such that P a (x C ). That is to say, for scaled vector x C, we always have: P a (x C ). Since the penalty ρ a (t) is increasing in positive variable t, we have: P a (x C ) T ρ a ( x C ) = T ρ a ( x C ) = T (a + ) x ac + x, where T is the cardinality of the support set T of vector x. For P a (x C ), T (a + ) x a C + x 7

8 suffices, or: C x a ( a T + T ). (2.) Let β be the l minimizer for the constrained l optimization problem (.) with support set T. Due to the scale-invariance of l, βc (defined similarly as above) is a global l minimizer for the normalized problem: min x x, s.t. y C = Ax, (2.) with the same support set T. The exact recovery is stated below with proof in the appendix A for the normalized ρ a minimization problem: min x P a (x), s.t. y C = Ax. (2.2) Theorem 2.2. (Exact TL Sparse Recovery) For a given sensing matrix A, let βc be a minimizer of (2.), with C satisfying (2.). Let T be the support set of βc, with cardinality T. Suppose there is a number R > T such that b = ( a a+ )2 R T > and δ R + b δ R+ T < b, (2.3) then the minimizer β C of (2.2) is unique and equal to the minimizer βc in (2.). Moreover, C β C is the unique minimizer of the l minimization problem (.). Remark 2.3. In Theorem 2.2, if we choose R = 3 T, the RIP condition (2.3) is a 2 a 2 δ 3 T + 3 (a + ) 2 δ 4 T < 3 (a + ) 2. This inequality will approach δ 3 T + 3δ 4 T < 2 as parameter a goes to +, which is the RIP condition in [4] satisfied by Gaussian random matrices with i.i.d. entries. The RIP condition (2.3) is satisfied by the same class of Gaussian matrices when a is sufficiently large, though it is more stringent when a gets smaller. This is due to the lack of scaling property of the TL penalty and the sub-optimal treatment in the RIP analysis. Hence the true advantage of TL penalty for CS problems, to be seen in our numerical results later, is not reflected in the RIP condition. Theoretically, it is an open question to find random matrices that satisfy gnsp of TL but not NSP. The RIP analysis of exact TL recovery allows a stable recovery analysis stated below with proof in the appendix B. For a positive number τ, we consider the problem: min x P a (x), s.t. y C Ax 2 τ. (2.4) Theorem 2.3. (Stable TL Sparse Recovery) Under the same RIP condition (2.3) in Theorem 2.2, the solution βc n of the problem (2.4) satisfies the inequality β n C β C 2 Dτ, for a positive constant D depending only on δ R and δ R+ T. 8

9 2.2. Sparsity of Local Minimizer We study properties of local minimizers of both the constrained problem (2.2) and the unconstrained model (2.3). As in l p and l 2 minimization [42, 23], a local minimizer of TL minimization extracts linearly independent columns from the sensing matrix A, without requiring A to satisfy NSP. Theorem 2.4. (Local minimizer of constrained model) Suppose x is a local minimizer of the constrained problem (2.2) and T = supp(x ), then A T is of full column rank, i.e. columns of A T are linearly independent. Proof. Here we argue by contradiction. Suppose that the column vectors of A T are not linearly independent, then there exists non-zero vector v ker(a), such that supp(v) T. For any neighbourhood of x, N(x, r), we can scale v so that: Next we define: v 2 min{r; x i, i T }. (2.5) ξ = x + v, ξ 2 = x v, so ξ, ξ 2 B(x, r), and x = 2 (ξ + ξ 2 ). On the other hand, from supp(v) T, we have that supp(ξ ), supp(ξ 2 ) T. Moreover, due to the inequality (2.5), vectors x, ξ, and ξ 2 are located in the same orthant, i.e. sign(x i ) = sign(ξ,i) = sign(ξ 2,i ), for any index i. It means that 2 ξ + 2 ξ 2 = 2 ξ + ξ 2. Since the penalty function P a (t) is strictly concave for non-negative variable t, 2 P a(ξ ) + 2 P a(ξ 2 ) = 2 P a( ξ ) + 2 P a( ξ 2 ) < P a ( 2 ξ + 2 ξ 2 ) = P a ( 2 ξ + ξ 2 ) = P a (x ). So for any fixed r, we can find two vectors ξ and ξ 2 in the neighbourhood B(x, r), such that min{p a (ξ ), P a (ξ 2 )} 2 P a(ξ ) + 2 P a(ξ 2 ) < P a (x ). Both vectors are in the feasible set of the constrained problem (2.2), in contradiction with the assumption that x is a local minimizer. The same property also holds for the local minimizers of unconstrained model (2.3), because a local minimizer of the unconstrained problem is also a local minimizer for a constrained optimization model [4, 42]. We skip the details and state the result below. Theorem 2.5. (Local minimizer of unconstrained model) Suppose x is a local minimizer of the unconstrained problem (2.3) and T = supp(x ), then columns of A T are linearly independent. Remark 2.4. From the two theorems above, we conclude the following: (i) For any local minimizer of (2.2) or (2.3), e.g. x, the sparsity of x is at most rank(a); (ii) The number of local minimizers is finite, for both problem (2.2) and (2.3). 3. DC Algorithm for Transformed l Penalty DC (Difference of Convex functions) programming and DCA (DC Algorithms) were introduced in 985 by Pham Dinh Tao, and extensively developed by Le Thi Hoai An, Pham Dinh Tao and their coworkers to become a useful tool for non-convex optimization 9

10 and sparse signal recovery ([32, 9, 2, 2] and references therein). program is of the form A standard DC α = inf{f(x) = g(x) h(x) : x R N } (P dc ), where g, h are lower semicontinuous proper convex functions on R n. Here f is called a DC function, while g h is a DC decomposition of f. The DCA is an iterative method and generates a sequence {x k }. At the current point x l of iteration, function h(x) is approximated by its affine minorization h l (x), defined by h l (x) = h(x l ) + x x l, y l, y l h(x l ), where the subdifferential h(x) at x dom (h) is the closed convex set: h(x) := {y R N : h(z) h(x) + z x, y, z R N }, (3.) which generalizes the derivative in the sense that h is differentiable at x if and only if h(x) is a singleton or { h(x)}. The minorization gives a convex program of the form: inf{g(x) h l (x) : x R N } inf{g(x) x, y l : x R N }, where the optimal solution is denoted as x l+. In the following, we present DCAs for TL regularized problems, see related DCAs in [9, 2]. We refer to [2] for DCAs on general sparse penalty regularized problems and the consistency analysis (convergence of global minimizers of the regularized problems to the l minimizers). 3.. DC Form of TL The TL penalty function p a ( ) is written as a difference of two convex functions: ρ a (t) = = = (a + ) t a + t (a + ) t a (a + ) t a ( (a + ) t a (a + )t2 a(a + t ), (a + ) t a + t where the second term is C. The general derivative of function P a ( ) is: where: ) (3.2) P a (x) = a + a x ϕ a (x), (3.3) ϕ a (x) = N i= (a + ) x i 2 a(a + x i ) (3.4) is a C function with regular gradient, and x is the subdifferential of x, i.e. x = {sgn(x i )} i=,...,n, where { sign(t), if t, sgn(t) = (3.5) [, ], otherwise.

11 3.2. Algorithm for Unconstrained Model DCATL For the unconstrained optimization problem (2.3): min f(x) = min x R N x R N 2 Ax y λp a (x), a DC decomposition is f(x) = g(x) h(x), where { g(x) = 2 Ax y c x 2 (a + ) 2 + λ x ; a h(x) = λϕ a (x) + c x 2 2. (3.6) Here the function ϕ a (x) is defined in equation (3.4). Additional factor c x 2 2 with positive hyperparameter c > is used to improve the convexity of these two functions, and will be used in the convergence theorem. Algorithm : DCA for unconstrained transformed l penalty minimization Define: ɛ outer > Initialize: x =, n = while x n+ x n > ɛ outer do v n h(x n ) = λ ϕ a (x n ) + 2cx n x n+ = arg min { x R N 2 Ax y c x 2 + λ then n + n end while (a + ) x x, v n } a At each step, we solve a strongly convex l -regularized sub-problem: x n+ = arg min x R N { 2 Ax y c x 2 + λ (a + ) x x, v n } a = arg min x R N { 2 xt (A t A + 2cI)x x, v n + A t y + λ (a + ) x }. a (3.7) We now employ the Alternating Direction Method of Multipliers (ADMM), [2]. After introducing a new variable z, the sub-problem is recast as: min { x,z R N 2 xt (A t A + 2cI)x x, v n + A t (a + ) y + λ z } a s.t. x z =. Define the augmented Lagrangian function as: L(x, z, u) = 2 xt (A t A + 2cI)x x, v n + A t y + λ (3.8) (a + ) z + δ a 2 x z u t (x z), where u is the Lagrange multiplier, and δ > is a penalty parameter. The ADMM consists of three iterations: x k+ = arg min x L(x, z k, u k ); z k+ = arg min z L(x k+, z, u k ); u k+ = u k + δ(x k+ z k+ ). The first two steps have closed-form solutions and are described in Algorithm 2, where shrink(, ) is a soft-thresholding operator given by: shrink(x, r) i = sgn(x i ) max{ x i r, }.

12 Algorithm 2: ADMM for subproblem (3.7) Initial guess: x, z, u and iterative index k = while not converged do x k+ := (A t A + 2cI + δi) (A t y v n + δz k u k ) z k+ := shrink( x k+ + u k, a+ aδ λ ) u k+ := u k + δ(x k+ z k+ ) then k + k end while 3.3. Convergence Theory for Unconstrained DCATL We present a convergence theory for the Algorithm (DCATL). We prove that the sequence {f(x n )} is decreasing and convergent, while the sequence {x n } is bounded under some requirement on λ. Its subsequential limit vector x is a stationary point satisfying the first order optimality condition. Our proof is based on the convergence theory of l l 2 penalty function [42] besides the general DCA results [33, 34]. Definition 3.. (Modulus of strong convexity) For a convex function f(x), the modulus of strong convexity of f on R N, denoted as m(f), is defined by m(f) := sup{ρ > : f ρ is convex on R N }. Let us recall an inequality from Proposition A. in [34] concerning the sequence f(x n ). Lemma 3.. Suppose that f(x) = g(x) h(x) is a D.C. decomposition, and the sequence {x n } is generated by (3.7), then f(x n ) f(x n+ ) m(g) + m(h) x n+ x n The convergence theory is below for our unconstrained Algorithm DCATL. The objective function is : f(x) = 2 Ax y λp a (x). Theorem 3.. The sequences {x n } and {f(x n )} in Algorithm satisfy:. Sequence {f(x n )} is decreasing and convergent. 2. x n+ x n 2 as n. If λ > y 2 2 2(a + ), {xn } n= is bounded. 3. Any subsequential limit vector x of {x n } satisfies the first order optimality condition: implying that x is a stationary point of (2.3). A T (Ax y) + λ P a (x ), (3.9) Proof.. By the definition of g(x) and h(x) in equation (3.6), it is easy to see that: m(g) 2c; m(h) 2c. 2

13 By Lemma 3., we have: f(x n ) f(x n+ m(g) + m(h) ) x n+ x n c x n+ x n 2 2. So the sequence {f(x n )} is decreasing and non-negative, thus convergent. 2. It follows from the convergence of {f(x n )} that: x n+ x n 2 2 f(xn ) f(x n+ ) 2c, as n. If y =, since the initial vector x =, and the sequence {f(x n )} is decreasing, we have f(x n ) =, n. So x n =, and the boundedness holds. Consider non-zero vector y. Then f(x n ) = 2 Axn y λp a (x n ) f(x ) = 2 y 2 2. So λp a (x n ) 2 y 2 2, implying 2λρ a ( x n ) y 2 2, or: 2λ(a + ) x n a + x n y 2 2. If λ > y 2 2 2(a + ), then x n a y 2 2 2λ(a + ) y 2. 2 Thus the sequence {x n } n= is bounded. 3. Let {x n k } be a subsequence of {x n } which converges to x. So the optimality condition at the n k -th step of Algorithm is expressed as: A T (Ax n k y) + 2c(x n k x n k ) +λ( a+ a ) xn k λ ϕ a (x n k ). (3.) Since x n+ x n 2 as n and x n k converges to x, as shown in Proposition 3. of [42], we have that for sufficiently large index n k, Letting n k in (3.), we have x n k x. A T (Ax y) + λ( a + a ) x λ ϕ a (x ). By the definition of P a (x) at (3.3), we have A T (Ax y) + λ P a (x ). 3

14 3.4. Algorithm for Constrained Model Here we also give a DCA scheme to solve the constrained problem (2.2) We can rewrite the above optimization as min P a (x) s.t. Ax = y. x R N a + min x R N a x ϕ a (x) s.t. Ax = y. a + min x R N a x + χ(x) {Ax=y} ϕ a (x) = g(x) h(x), (3.) where g(x) = a + a x + χ(x) {Ax=y} is a polyhedral convex function [33]. Let z = ϕ a (x), then the convex sub-problem is: a + min x R N a x z, x s.t. Ax = y. (3.2) To solve (3.2), we introduce two Lagrange multipliers u, v and define an augmented Lagrangian: L δ (x, w, u, v) = a + a w z t x + u t (x w) + v t (Ax y) + δ 2 x w 2 + δ 2 Ax y 2, where δ >. ADMM finds a saddle point (x, w, u, v ), such that: L δ (x, w, u, v) L δ (x, w, u, v ) L δ (x, w, u, v ) x, w, u, v by alternately minimizing L δ with respect to x, minimizing with respect to y and updating the dual variables u and v. The saddle point x will be a solution to (3.2). The overall algorithm for solving the constrained TL is described in Algorithm (3). Algorithm 3: DCA method for constrained TL minimization Define ɛ outer >, ɛ inner >. Initialize x = and outer loop index n = while x n x n+ ɛ outer do z = ϕ a (x n ) Initialization of inner loop: x in = w = x n, v = and u =. Set inner index j =. while x j in xj+ ɛ inner do := (At A + I) (w j + A t y + z uj A t v j x j+ in w j = shrink( x j+ in + uj δ, a+ aδ ) u j+ := u j + δ(x j+ w j ) v j+ := v j + δ(ax j+ y) end while x n = x j in and n = n +. end while δ ) According to DC decomposition scheme (3.), Algorithm 3 is a polyhedral DC program. Similar convergence theorem as the unconstrained model in the last section can be proved. Furthermore, due to property of polyhedral DC programs, this constrained DCA also has a finite convergence. It means that if the inner subproblem (3.2) is exactly solved, {x n }, the sequence generated by this iterative DC algorithm, has finite subsequential limit points [33]. 4

15 4. Numerical Results In this section, we use two classes of randomly generated matrices to illustrate the effectiveness of our Algorithms: DCATL (difference convex algorithm for transformed l penalty) and its constrained version. We compare them separately with several stateof-the-art solvers on recovering sparse vectors: unconstrained algorithms: (i) Reweighted l /2 [8]; (ii) DCA l 2 algorithm [42, 23]; (iii) CEL [35] constrained algorithms: (i) Bregman algorithm [44]; (ii) Yall; (iii) Lp RLS []. All our tests were performed on a Lenovo desktop with 6 GB of RAM and Intel Core processor i7 477 with CPU at 3.4GHz 8 under 64-bit Ubuntu system. The two classes of random matrices are: ) Gaussian matrix. 2) Over-sampled DCT with factor F. We did not use prior information of the true sparsity of the original signal x. Also, for all the tests, the computation is initialized with zero vectors. In fact, the DCATL does not guarantee a global minimum in general, due to nonconvexity of the problem. Indeed we observe that DCATL with random starts often gets stuck at local minima especially when the matrix A is ill-conditioned (e.g. A has a large condition number or is highly coherent). In the numerical experiments, by setting x =, we find that DCATL usually produces an optimal solution, exactly or almost equal to the ground truth vector. The intuition behind our choice is that by using zero vector as initial guess, the first step of our algorithm reduces to solving an unconstrained weighted l problem. So basically we are minimizing TL on the basis of l, which is why minimization of TL initialized by x = always outperforms l, see [43] for a rigorous analysis. 4.. Choice of Parameter: a In DCATL, parameter a is also very important. When a tends to zero, the penalty function approaches the l norm. If a goes to +, objective function will be more convex and act like the l optimization. So choosing a better a will improve the effectiveness and success rate for our algorithm. We tested DCATL on recovering sparse vectors with different parameter a, varying among {..3 2 }. In this test, A is a random matrix generated by normal Gaussian distribution. The true vector x is also a randomly generated sparse vector with in the set { }. Here the regularization parameter λ was set to be 5 for all tests. Although the best λ may be k-dependent in general, we considered the noiseless case and chose λ = 5 (small and fixed) to approximately enforce Ax = Ax. For each a, we sampled times with different A and x. The recovered vector x r is accepted and recorded as one success if the relative x error: r x 2 x 2 3. Fig. 4. shows the success rate using DCATL over independent trials for various values of parameter a and. From the figure, we see that DCATL with a = is the best among all tested values. Also numerical results for a =.3 and 5

16 .9 success rate a=. a=.3 a= a=2 a= Fig. 4.: Numerical tests on parameter a with M = 64, N = 256 by the unconstrained DCATL method. a = 2 (near ), are better than those with. and. This is because the objective function is more non-convex at a smaller a and thus more difficult to solve. On the other hand, iterations are more likely to stop at a local l minima far from l solution if a is too large. Thus in all the following tests, we set the parameter a = Numerical Experiment for Unconstrained Algorithm The stopping conditions for outer loops are relative iteration error xn+ x n 2 x n+ 2 < 5 and maximum iteration steps 2. While, for the inner loop, the stopping condition are relative iteration error 8 and maximum iteration steps 5. The methods in comparison methods are applied with default parameters Gaussian matrix We use N (, Σ), the multi-variable normal distribution to generate Gaussian matrix A. Here covariance matrix is Σ = {( r) χ (i=j) + r} i,j, where the value of r varies from to. In theory, the larger the r is, the more difficult it is to recover true sparse vector. For matrix A, the row number and column number are set to be M = 64 and N = 24. The varies among { }. We compare four algorithms in terms of success rate. Denote x r as a reconstructed solution by a certain algorithm. We consider one algorithm to be successful, if the relative error of x r to the truth solution x is less that., i.e., xr x 2 x 2 < 3. In order to improve success rates for all compared algorithms, we set tolerance parameter to be smaller or maximum cycle number to be higher inside each algorithm. The success rate of each algorithm is plotted in Figure 4.2 with parameter r from the set: { }. For all cases, DCATL and reweighted l /2 algorithms 6

17 .9 r = DCATL IRucLq_v DCA l-l2 CEL.9 r = DCATL IRucLq_v DCA l-l2 CEL r = DCATL IRucLq_v DCA l-l2 CEL r = DCATL IRucLq_v DCA l-l2 CEL Fig. 4.2: Numerical tests for unconstrained algorithms under Gaussian generated matrices: M = 64, N = 24 with different coherence r. (IRucLq-v) performed almost the same and both were much better than the other two, while the CEL has the lowest success rate Over-sampled DCT The over-sampled DCT matrices [6, 23, 42] are: A = [a,..., a N ] R M N, a j = 2πω(j ) cos( ), j =,..., N, M F and ω is a random vector, drawn uniformly from (, ) M. (4.) Such matrices appear as the real part of the complex discrete Fourier matrices in spectral estimation [6]. An important property is their high coherence: for a matrix with F =, the coherence is.998, while the coherence of the same size matrix with F = 2, is typically The sparse recovery under such matrices is possible only if the non-zero elements of solution x are sufficiently separated. This phenomenon is characterized as minimum separation in [7], and this minimum length is referred as the Rayleigh length (RL). The value of RL for matrix A is equal to the factor F. It is closely related to the coherence in the sense that larger F corresponds to larger coherence of a matrix. We find empirically 7

18 Table 4.: The success rates (%) of DCATL for different combination of sparsity and minimum separation lengths. sparsity RL RL RL RL 7 2 5RL that at least 2RL is necessary to ensure optimal sparse recovery with spikes further apart for more coherent matrices. Under the assumption of sparse signal with 2RL separated spikes, we compare those four algorithms in terms of success rate. Denote x r as a reconstructed solution by a certain algorithm. We consider one algorithm successful, if the relative error of x r to the truth solution x is less that 3, i.e., xr x 2 x 2 < 3. The success rate is averaged over 5 random realizations. Fig. 4.3 shows success rates for those algorithms with increasing factor F from 2 to 2. The sensing matrix is of size 5. It is interesting to see that along with the increasing of value F, DCA of l l 2 algorithm performs better and better, especially after F, and it has the highest success rate among all. Meanwhile, reweighted l /2 is better for low coherent matrices. When F, it is almost impossible for it to recover sparse solution for the highly coherent matrix. Our DCATL, however, is more robust and consistently performed near the top, sometimes even the best. So it is a valuable choice for solving sparse optimization problems where coherence of sensing matrix is unknown. We further look at the success rates of DCATL with different combinations of sparsity and separation lengths for the over-sampled DCT matrix A. The rates are recorded in Table 4., which shows that when the separation is above with the minimum length, the sparsity relative to M plays more important role in determining the success rates of recovery Numerical Experiment for Constrained Algorithm For constrained algorithms, we performed similar numerical experiments. An algorithm is considered successful if the relative error of the numerical result x r from the ground truth x is less than 3, or xr x 2 x 2 < 3. We did 5 trials to compute average success rates for all the numerical experiments as for the unconstrained algorithms. The stopping conditions for outer loop are relative iteration error xn+ x n 2 x n+ 2 < 5 and maximum iteration steps 2. While, for the inner loop, the stopping condition are relative iteration error 5 and maximum iteration steps. For other comparison methods, they are applied with default parameters Gaussian Random Matrices We fix parameters (M, N) = (64, 24), while covariance parameter r is varied from to. Comparison is with the reweighted l /2 and two l algorithms (Bregman and yall). In Fig. (4.4), we see that Lp RLS is the best among the four algorthms with DCATL trailing not much behind. 8

19 .9 F = 2 DCATL IRucLq_v DCA l-l2 CEL.9 F = 4 DCATL IRucLq_v DCA l-l2 CEL.9 F = 6 DCATL IRucLq_v DCA l-l2 CEL F = DCATL IRucLq_v DCA l-l2 CEL F = 6 DCATL IRucLq_v DCA l-l2 CEL F = 2 DCATL IRucLq_v DCA l-l2 CEL Fig. 4.3: Numerical test for unconstrained algorithms under over-sampled DCT matrices: M =, N = 5 with different F, and peaks of solutions separated by 2RL = 2F Over-sampled DCT We fix (M, N) = (, 5), and vary parameter F from 2 to 2, so the coherence of there matrices has a wider range and almost reaches at the high end. In Fig. (4.5), when F is small, say F = 2, 4, Lp RLS still performs the best, similar to the case of Gaussian matrices. However, with increasing F, the success rates for Lp RLS declines quickly, worse than the Bregman l algorithm at F = 6,. The performance for DCATL is very stable and maintains a high level consistently even at the very high end of coherence (F = 2). Remark 4.. In view of evaluation results in this section (Fig. 4.2, Fig. 4.3, Fig. 4.4, Fig. 4.5), we see that DCATL algorithms offer robust solutions in CS problems for random sensing matrices with a broad range of coherence. In applications where sensing hardwares cannot be modified or upgraded, a robust recovery algorithm is a valuable tool for information retrieval. An example is super-resolution where sparse signals are recovered from low frequency measurements within the hardware resolution limit [7, 22]. 5. Comparison of DCA on Different Non-convex Penalties In this section, we compare DCA on other non-convex penalty functions such as PiE [3], MCP [46], and SCAD [5]. The computation is based on our DCA-ADMM scheme, which uses Algorithm to solve unconstrained optimization and Algorithm 2 to solve subproblems. The DCA schemes of these penalty functions are same as in [, 3]. Among the three penalties, PiE has one hyperparameter while MCP and SCAD have two hyperparameters. We used Gaussian random matries to select best parameters for these penalties. All other parameters for DCA algorithms during the numerical experiments are same as DCA-TL. The success rate curves are shown in Fig. 5.. DCA-MCP algorithm 9

20 .9 r = Bregman DCATL Lp_RLS yall.9 r = Bregman DCATL Lp_RLS yall r = Bregman DCATL Lp_RLS yall r = Bregman DCATL Lp_RLS yall Fig. 4.4: Comparison of constrained algorithms for Gaussian random matrices with different coherence parameter r. The data points are averaged over 5 trials. Formula Parameters PiE φ(t) = e β t β = MCP φ(t) = αβ 2 [(αβ t ) +] 2 α = 5, β =. α β t, if t β; SCAD φ(t) = t2 2αβ t + β 2, if β < t αβ; 2(α ) α = 5, β =. (α + )β 2, if t > αβ. 2 Table 5.: Three non-convex penalty functions and their parameter values in the numerical experiments. has very good performance on all Gaussian and over-sampled DCT matrices. In all the experiments, DCA-TL achieves almost the same level of success rates as DCA-MCP. Consistent with remark 2.2 and that the set of SCAD gnsp satisfying matrices is smaller than those of (PiE,TL,MCP), SCAD is behind in the two plots of the first column of Fig. 5. where the sensing matrices are in the incoherent regime. Interestingly in the 2

21 .9 F = 2.9 F = 4.9 F = 6 Bregman DCATL Lp_RLS yall Bregman DCATL Lp_RLS yall.3. Bregman DCATL Lp_RLS yall F = Bregman DCATL Lp_RLS yall F = 2 Bregman DCATL Lp_RLS yall F = 2 Bregman DCATL Lp_RLS yall Fig. 4.5: Comparison of success rates of constrained algorithms for the over-sampled DCT random matrices: (M, N) = (, 5) with different F values, peak separation by 2RL = 2F. highly coherent regime (for over-sampled DCT matrices at F = 2), DCA-SCAD fares well. From PiE behind MCP, TL in Fig. 5., and l p (in view of Fig. 4.2), the gnsp of PiE is likely to be more restrictive than those of MCP, TL and l p, while the gnsp s of the latter three are rather close. A precise characterization of these gnsp s will be interesting for a future work. It is worth pointing out that DCA-TL has only one hyperparameter and so is easier to adjust and adapt to different tasks. Two hyperparameters give more parameter space for improvement, but also require more efforts to search for optimal values. 6. Concluding Remarks We have studied compressed sensing problems with the transformed l penalty function (TL) for both unconstrained and constrained models with random sensing matrices of a broad range of coherence. We discussed exact and stable recovery properties of TL using null space property and restricted isometry property of sensing matrices. We showed two DC algorithms along with a convergence theory. In numerical experiments, DCATL with ADMM solving the convex sub-problems is on par with the best method reweighted l /2 (Lp RLS) in the unconstrained (constrained) model, in case of incoherent Gaussian sensing matrices. For highly coherent over-sampled DCT random matrices, DCATL with ADMM solving the convex subproblems is also comparable with the best method DCA l l 2 algorithm. For random matrices of varying degree of coherence, the DCATL algorithm is the most robust for constrained and unconstrained models alike. We tested DCA with ADMM inner solver on other non-convex penalties (PiE, SCAD, MCP) with one and two hyperparameters, and found DCATL to be competitive as well. 2

22 Gaussian Matrix with r = Gaussian Matrix with r =.9 Gaussian Matrix with r = DCATL DCA PiE DCA MCP DCA SCAD. DCATL DCA PiE DCA MCP DCA SCAD. DCATL DCA PiE DCA MCP DCA SCAD DCT matrix with F = 2 DCT matrix with F = 6 DCT matrix with F = DCATL DCA PiE DCA MCP DCA SCAD. DCATL DCA PiE DCA MCP DCA SCAD. DCATL DCA PiE DCA MCP DCA SCAD Fig. 5.: Numerical tests for DCA-ADMM scheme with different non-convex penalties using random Gaussian matrices and 5 over-sampled DCT matrices of varying (r, F ) values. In future work, we plan to develop TL algorithms for image processing and machine learning applications. Acknowledgments. The authors would like to thank Professor Wenjiang Fu for referring us to [25], Professor Jong-Shi Pang for his helpful suggestions, and the anonymous referees for their constructive comments. Appendix A. Proof of Exact TL Sparse Recovery (Theorem 2.). Proof. The proof is along the lines of arguments in [4] and [9], while using special properties of the penalty function ρ a. For simplicity, we denote β C by β and β C by β. e T c Then Let e = β β, and we want to prove that the vector e =. It is clear that, = β T c, since T is the support set of β. By the triangular inequality of ρ a, we have: It follows that: P a (β ) P a (e T ) = P a (β ) P a ( e T ) P a (β T ). P a (β ) P a (e T ) + P a (e T c) P a (β T ) + P a (β T c) = P a (β) P a (β ) P a (β T c) = P a (e T c) P a (e T ). (.) Now let us arrange the components at T c in the order of decreasing magnitude of e and partition into L parts: T c = T T 2... T L, where each T j has R elements (except 22

23 possibly T L with less). Also denote T = T and T = T T. Since Ae = A(β β ) =, it follows that = Ae 2 = A T e T + L A Tj e Tj 2 j=2 A T e T 2 L A Tj e Tj 2 j=2 δ T +R e T 2 L + δ R e Tj 2 j=2 (.2) At the next step, we derive two inequalities between the l 2 norm and function P a, in order to use the inequality (.). Since ρ a ( t ) = (a + ) t a + t ( a + a ) t = ( + a ) t we have: P a (e T ) = i T ρ a ( e i ) ( + a ) e T ( + a ) T e T 2 ( + a ) T e T 2. (.3) Now we estimate the l 2 norm of e Tj from above in terms of P a. It follows from β being the minimizer of the problem (2.2) and the definition of x C (2.9) that we have For each i T c, P a (β T c) P a (β) P a (x C ). ρ a (β i ) P a (β T c). Also since e i = β i (a + ) β i a + β i (a + ) β i a + β i (a + ) β i a + β i β i = ρ a ( β i ) for every i T c. It is known that function ρ a (t) is increasing for non-negative variable t, and where j = 2, 3,..., L. Thus we have e i e k for i T j and k T j, e i ρ a ( e i ) P a (e Tj )/R e Tj 2 2 P a(e Tj ) 2 R e Tj 2 P a(e Tj ) R /2 L j=2 e Tj 2 L 23 j= P a (e Tj ) R /2 (.4) (.5)

24 Finally, plug (.3) and (.5) into inequality (.2) to get: a δ T +R (a + ) T P a(e /2 T ) + δ R R P a(e /2 T ) P ( a(e T ) a R δr+ T R /2 a + T ) (.6) + δ R By the RIP condition (2.3), the factor a R δ R+ T a + T + δ R is strictly positive, hence P a (e T ) =, and e T =. Also by inequality (.), e T c =. We have proved that β C = βc. The equivalence of (2.2) and (2.) holds. If another vector β is the optimal solution of (2.2), we can prove that it is also equal to βc, using the same procedure. Hence β C is unique. Appendix B. Proof of Stable TL Sparse Recovery (Theorem 2.2). Proof. Set n = Aβ y C. We use three notations below: (i) βc n optimal solution for the constrained problem (2.4); (ii) β C optimal solution for the constrained problem (2.2); (iii) βc optimal solution for the l problem (2.). Let T be the support set of βc, i.e., T = supp(β C ), and vector e = βn C β C. Following the proof of Theorem 2.2, we obtain: L e Tj 2 j=2 L j= P a (e Tj ) = P a(e T c) R /2 R /2 and e T 2 a (a + ) T P a(e T ). Further, due to the inequality P a (β n T c) = P a(e T c) P a (e T ) from (.) and inequalities in (.2), we get Ae 2 P a(e T ) R C δ, /2 where C δ = a R δ R+ T a + T + δ R. By the initial assumption on the size of observation noise, we have Ae 2 = Aβ n C Aβ C 2 = n 2 τ, (2.) so we have: P a (e T ) τr /2 C δ. On the other hand, we know that P a (β C ) and β C is in the feasible set of the problem (2.4). Thus we have the inequality: P a (β n C ) P a(β C ). By (.4), β n C,i for each i. So, we have β n C,i ρ a ( β n C,i ). (2.2) 24

Minimizing the Difference of L 1 and L 2 Norms with Applications

Minimizing the Difference of L 1 and L 2 Norms with Applications 1/36 Minimizing the Difference of L 1 and L 2 Norms with Department of Mathematical Sciences University of Texas Dallas May 31, 2017 Partially supported by NSF DMS 1522786 2/36 Outline 1 A nonconvex approach:

More information

SPARSE signal representations have gained popularity in recent

SPARSE signal representations have gained popularity in recent 6958 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 10, OCTOBER 2011 Blind Compressed Sensing Sivan Gleichman and Yonina C. Eldar, Senior Member, IEEE Abstract The fundamental principle underlying

More information

Sparsity Regularization

Sparsity Regularization Sparsity Regularization Bangti Jin Course Inverse Problems & Imaging 1 / 41 Outline 1 Motivation: sparsity? 2 Mathematical preliminaries 3 l 1 solvers 2 / 41 problem setup finite-dimensional formulation

More information

Introduction to Compressed Sensing

Introduction to Compressed Sensing Introduction to Compressed Sensing Alejandro Parada, Gonzalo Arce University of Delaware August 25, 2016 Motivation: Classical Sampling 1 Motivation: Classical Sampling Issues Some applications Radar Spectral

More information

Sparse Recovery via Partial Regularization: Models, Theory and Algorithms

Sparse Recovery via Partial Regularization: Models, Theory and Algorithms Sparse Recovery via Partial Regularization: Models, Theory and Algorithms Zhaosong Lu and Xiaorui Li Department of Mathematics, Simon Fraser University, Canada {zhaosong,xla97}@sfu.ca November 23, 205

More information

Optimization for Compressed Sensing

Optimization for Compressed Sensing Optimization for Compressed Sensing Robert J. Vanderbei 2014 March 21 Dept. of Industrial & Systems Engineering University of Florida http://www.princeton.edu/ rvdb Lasso Regression The problem is to solve

More information

Least Sparsity of p-norm based Optimization Problems with p > 1

Least Sparsity of p-norm based Optimization Problems with p > 1 Least Sparsity of p-norm based Optimization Problems with p > Jinglai Shen and Seyedahmad Mousavi Original version: July, 07; Revision: February, 08 Abstract Motivated by l p -optimization arising from

More information

Lecture: Introduction to Compressed Sensing Sparse Recovery Guarantees

Lecture: Introduction to Compressed Sensing Sparse Recovery Guarantees Lecture: Introduction to Compressed Sensing Sparse Recovery Guarantees http://bicmr.pku.edu.cn/~wenzw/bigdata2018.html Acknowledgement: this slides is based on Prof. Emmanuel Candes and Prof. Wotao Yin

More information

STAT 200C: High-dimensional Statistics

STAT 200C: High-dimensional Statistics STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 57 Table of Contents 1 Sparse linear models Basis Pursuit and restricted null space property Sufficient conditions for RNS 2 / 57

More information

Iterative Reweighted Minimization Methods for l p Regularized Unconstrained Nonlinear Programming

Iterative Reweighted Minimization Methods for l p Regularized Unconstrained Nonlinear Programming Iterative Reweighted Minimization Methods for l p Regularized Unconstrained Nonlinear Programming Zhaosong Lu October 5, 2012 (Revised: June 3, 2013; September 17, 2013) Abstract In this paper we study

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Instructor: Moritz Hardt Email: hardt+ee227c@berkeley.edu Graduate Instructor: Max Simchowitz Email: msimchow+ee227c@berkeley.edu

More information

Elaine T. Hale, Wotao Yin, Yin Zhang

Elaine T. Hale, Wotao Yin, Yin Zhang , Wotao Yin, Yin Zhang Department of Computational and Applied Mathematics Rice University McMaster University, ICCOPT II-MOPTA 2007 August 13, 2007 1 with Noise 2 3 4 1 with Noise 2 3 4 1 with Noise 2

More information

Computing Sparse Representation in a Highly Coherent Dictionary Based on Difference of L 1 and L 2

Computing Sparse Representation in a Highly Coherent Dictionary Based on Difference of L 1 and L 2 Computing Sparse Representation in a Highly Coherent Dictionary Based on Difference of L and L 2 Yifei Lou, Penghang Yin, Qi He and Jack Xin Abstract We study analytical and numerical properties of the

More information

Reconstruction from Anisotropic Random Measurements

Reconstruction from Anisotropic Random Measurements Reconstruction from Anisotropic Random Measurements Mark Rudelson and Shuheng Zhou The University of Michigan, Ann Arbor Coding, Complexity, and Sparsity Workshop, 013 Ann Arbor, Michigan August 7, 013

More information

Stability and Robustness of Weak Orthogonal Matching Pursuits

Stability and Robustness of Weak Orthogonal Matching Pursuits Stability and Robustness of Weak Orthogonal Matching Pursuits Simon Foucart, Drexel University Abstract A recent result establishing, under restricted isometry conditions, the success of sparse recovery

More information

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 11, NOVEMBER On the Performance of Sparse Recovery

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 11, NOVEMBER On the Performance of Sparse Recovery IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 11, NOVEMBER 2011 7255 On the Performance of Sparse Recovery Via `p-minimization (0 p 1) Meng Wang, Student Member, IEEE, Weiyu Xu, and Ao Tang, Senior

More information

Sparse Approximation via Penalty Decomposition Methods

Sparse Approximation via Penalty Decomposition Methods Sparse Approximation via Penalty Decomposition Methods Zhaosong Lu Yong Zhang February 19, 2012 Abstract In this paper we consider sparse approximation problems, that is, general l 0 minimization problems

More information

1 Computing with constraints

1 Computing with constraints Notes for 2017-04-26 1 Computing with constraints Recall that our basic problem is minimize φ(x) s.t. x Ω where the feasible set Ω is defined by equality and inequality conditions Ω = {x R n : c i (x)

More information

New Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit

New Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit New Coherence and RIP Analysis for Wea 1 Orthogonal Matching Pursuit Mingrui Yang, Member, IEEE, and Fran de Hoog arxiv:1405.3354v1 [cs.it] 14 May 2014 Abstract In this paper we define a new coherence

More information

Lecture Notes 9: Constrained Optimization

Lecture Notes 9: Constrained Optimization Optimization-based data analysis Fall 017 Lecture Notes 9: Constrained Optimization 1 Compressed sensing 1.1 Underdetermined linear inverse problems Linear inverse problems model measurements of the form

More information

Introduction to Alternating Direction Method of Multipliers

Introduction to Alternating Direction Method of Multipliers Introduction to Alternating Direction Method of Multipliers Yale Chang Machine Learning Group Meeting September 29, 2016 Yale Chang (Machine Learning Group Meeting) Introduction to Alternating Direction

More information

Compressed Sensing and Sparse Recovery

Compressed Sensing and Sparse Recovery ELE 538B: Sparsity, Structure and Inference Compressed Sensing and Sparse Recovery Yuxin Chen Princeton University, Spring 217 Outline Restricted isometry property (RIP) A RIPless theory Compressed sensing

More information

Large-Scale L1-Related Minimization in Compressive Sensing and Beyond

Large-Scale L1-Related Minimization in Compressive Sensing and Beyond Large-Scale L1-Related Minimization in Compressive Sensing and Beyond Yin Zhang Department of Computational and Applied Mathematics Rice University, Houston, Texas, U.S.A. Arizona State University March

More information

An Introduction to Sparse Approximation

An Introduction to Sparse Approximation An Introduction to Sparse Approximation Anna C. Gilbert Department of Mathematics University of Michigan Basic image/signal/data compression: transform coding Approximate signals sparsely Compress images,

More information

Sparse Optimization Lecture: Dual Certificate in l 1 Minimization

Sparse Optimization Lecture: Dual Certificate in l 1 Minimization Sparse Optimization Lecture: Dual Certificate in l 1 Minimization Instructor: Wotao Yin July 2013 Note scriber: Zheng Sun Those who complete this lecture will know what is a dual certificate for l 1 minimization

More information

Constrained optimization

Constrained optimization Constrained optimization DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_fall17/index.html Carlos Fernandez-Granda Compressed sensing Convex constrained

More information

1 Sparsity and l 1 relaxation

1 Sparsity and l 1 relaxation 6.883 Learning with Combinatorial Structure Note for Lecture 2 Author: Chiyuan Zhang Sparsity and l relaxation Last time we talked about sparsity and characterized when an l relaxation could recover the

More information

Compressed sensing. Or: the equation Ax = b, revisited. Terence Tao. Mahler Lecture Series. University of California, Los Angeles

Compressed sensing. Or: the equation Ax = b, revisited. Terence Tao. Mahler Lecture Series. University of California, Los Angeles Or: the equation Ax = b, revisited University of California, Los Angeles Mahler Lecture Series Acquiring signals Many types of real-world signals (e.g. sound, images, video) can be viewed as an n-dimensional

More information

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis Lecture 3: Sparse signal recovery: A RIPless analysis of l 1 minimization Yuejie Chi The Ohio State University Page 1 Outline

More information

Conditions for Robust Principal Component Analysis

Conditions for Robust Principal Component Analysis Rose-Hulman Undergraduate Mathematics Journal Volume 12 Issue 2 Article 9 Conditions for Robust Principal Component Analysis Michael Hornstein Stanford University, mdhornstein@gmail.com Follow this and

More information

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis Lecture 7: Matrix completion Yuejie Chi The Ohio State University Page 1 Reference Guaranteed Minimum-Rank Solutions of Linear

More information

A New Estimate of Restricted Isometry Constants for Sparse Solutions

A New Estimate of Restricted Isometry Constants for Sparse Solutions A New Estimate of Restricted Isometry Constants for Sparse Solutions Ming-Jun Lai and Louis Y. Liu January 12, 211 Abstract We show that as long as the restricted isometry constant δ 2k < 1/2, there exist

More information

Sparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda

Sparse regression. Optimization-Based Data Analysis.   Carlos Fernandez-Granda Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

Sparsity and Compressed Sensing

Sparsity and Compressed Sensing Sparsity and Compressed Sensing Jalal Fadili Normandie Université-ENSICAEN, GREYC Mathematical coffees 2017 Recap: linear inverse problems Dictionary Sensing Sensing Sensing = y m 1 H m n y y 2 R m H A

More information

c 2015 Society for Industrial and Applied Mathematics

c 2015 Society for Industrial and Applied Mathematics SIAM J. SCI. COMPUT. Vol. 37, No., pp. A536 A563 c 205 Society for Industrial and Applied Mathematics MINIMIZATION OF l 2 FOR COMPRESSED SENSING PENGHANG YIN, YIFEI LOU, QI HE, AND JACK XIN Abstract. We

More information

Linear Convergence of Adaptively Iterative Thresholding Algorithms for Compressed Sensing

Linear Convergence of Adaptively Iterative Thresholding Algorithms for Compressed Sensing Linear Convergence of Adaptively Iterative Thresholding Algorithms for Compressed Sensing Yu Wang, Jinshan Zeng, Zhimin Peng, Xiangyu Chang, and Zongben Xu arxiv:48.689v3 [math.oc] 6 Dec 5 Abstract This

More information

4TE3/6TE3. Algorithms for. Continuous Optimization

4TE3/6TE3. Algorithms for. Continuous Optimization 4TE3/6TE3 Algorithms for Continuous Optimization (Algorithms for Constrained Nonlinear Optimization Problems) Tamás TERLAKY Computing and Software McMaster University Hamilton, November 2005 terlaky@mcmaster.ca

More information

1 Regression with High Dimensional Data

1 Regression with High Dimensional Data 6.883 Learning with Combinatorial Structure ote for Lecture 11 Instructor: Prof. Stefanie Jegelka Scribe: Xuhong Zhang 1 Regression with High Dimensional Data Consider the following regression problem:

More information

Sparse and Robust Optimization and Applications

Sparse and Robust Optimization and Applications Sparse and and Statistical Learning Workshop Les Houches, 2013 Robust Laurent El Ghaoui with Mert Pilanci, Anh Pham EECS Dept., UC Berkeley January 7, 2013 1 / 36 Outline Sparse Sparse Sparse Probability

More information

AN AUGMENTED LAGRANGIAN AFFINE SCALING METHOD FOR NONLINEAR PROGRAMMING

AN AUGMENTED LAGRANGIAN AFFINE SCALING METHOD FOR NONLINEAR PROGRAMMING AN AUGMENTED LAGRANGIAN AFFINE SCALING METHOD FOR NONLINEAR PROGRAMMING XIAO WANG AND HONGCHAO ZHANG Abstract. In this paper, we propose an Augmented Lagrangian Affine Scaling (ALAS) algorithm for general

More information

Bayesian Paradigm. Maximum A Posteriori Estimation

Bayesian Paradigm. Maximum A Posteriori Estimation Bayesian Paradigm Maximum A Posteriori Estimation Simple acquisition model noise + degradation Constraint minimization or Equivalent formulation Constraint minimization Lagrangian (unconstraint minimization)

More information

Interpolation-Based Trust-Region Methods for DFO

Interpolation-Based Trust-Region Methods for DFO Interpolation-Based Trust-Region Methods for DFO Luis Nunes Vicente University of Coimbra (joint work with A. Bandeira, A. R. Conn, S. Gratton, and K. Scheinberg) July 27, 2010 ICCOPT, Santiago http//www.mat.uc.pt/~lnv

More information

arxiv: v1 [math.oc] 23 May 2017

arxiv: v1 [math.oc] 23 May 2017 A DERANDOMIZED ALGORITHM FOR RP-ADMM WITH SYMMETRIC GAUSS-SEIDEL METHOD JINCHAO XU, KAILAI XU, AND YINYU YE arxiv:1705.08389v1 [math.oc] 23 May 2017 Abstract. For multi-block alternating direction method

More information

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28 Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:

More information

Sparse Solutions of Systems of Equations and Sparse Modelling of Signals and Images

Sparse Solutions of Systems of Equations and Sparse Modelling of Signals and Images Sparse Solutions of Systems of Equations and Sparse Modelling of Signals and Images Alfredo Nava-Tudela ant@umd.edu John J. Benedetto Department of Mathematics jjb@umd.edu Abstract In this project we are

More information

Introduction How it works Theory behind Compressed Sensing. Compressed Sensing. Huichao Xue. CS3750 Fall 2011

Introduction How it works Theory behind Compressed Sensing. Compressed Sensing. Huichao Xue. CS3750 Fall 2011 Compressed Sensing Huichao Xue CS3750 Fall 2011 Table of Contents Introduction From News Reports Abstract Definition How it works A review of L 1 norm The Algorithm Backgrounds for underdetermined linear

More information

Douglas-Rachford splitting for nonconvex feasibility problems

Douglas-Rachford splitting for nonconvex feasibility problems Douglas-Rachford splitting for nonconvex feasibility problems Guoyin Li Ting Kei Pong Jan 3, 015 Abstract We adapt the Douglas-Rachford DR) splitting method to solve nonconvex feasibility problems by studying

More information

Compressed Sensing via Partial l 1 Minimization

Compressed Sensing via Partial l 1 Minimization WORCESTER POLYTECHNIC INSTITUTE Compressed Sensing via Partial l 1 Minimization by Lu Zhong A thesis Submitted to the Faculty of the WORCESTER POLYTECHNIC INSTITUTE in partial fulfillment of the requirements

More information

You should be able to...

You should be able to... Lecture Outline Gradient Projection Algorithm Constant Step Length, Varying Step Length, Diminishing Step Length Complexity Issues Gradient Projection With Exploration Projection Solving QPs: active set

More information

Compressibility of Infinite Sequences and its Interplay with Compressed Sensing Recovery

Compressibility of Infinite Sequences and its Interplay with Compressed Sensing Recovery Compressibility of Infinite Sequences and its Interplay with Compressed Sensing Recovery Jorge F. Silva and Eduardo Pavez Department of Electrical Engineering Information and Decision Systems Group Universidad

More information

Recovery of Sparse Signals from Noisy Measurements Using an l p -Regularized Least-Squares Algorithm

Recovery of Sparse Signals from Noisy Measurements Using an l p -Regularized Least-Squares Algorithm Recovery of Sparse Signals from Noisy Measurements Using an l p -Regularized Least-Squares Algorithm J. K. Pant, W.-S. Lu, and A. Antoniou University of Victoria August 25, 2011 Compressive Sensing 1 University

More information

Recent Developments in Compressed Sensing

Recent Developments in Compressed Sensing Recent Developments in Compressed Sensing M. Vidyasagar Distinguished Professor, IIT Hyderabad m.vidyasagar@iith.ac.in, www.iith.ac.in/ m vidyasagar/ ISL Seminar, Stanford University, 19 April 2018 Outline

More information

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation. CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.

More information

Exact penalty decomposition method for zero-norm minimization based on MPEC formulation 1

Exact penalty decomposition method for zero-norm minimization based on MPEC formulation 1 Exact penalty decomposition method for zero-norm minimization based on MPEC formulation Shujun Bi, Xiaolan Liu and Shaohua Pan November, 2 (First revised July 5, 22) (Second revised March 2, 23) (Final

More information

SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING. Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu

SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING. Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu SOLVING NON-CONVEX LASSO TYPE PROBLEMS WITH DC PROGRAMMING Gilles Gasso, Alain Rakotomamonjy and Stéphane Canu LITIS - EA 48 - INSA/Universite de Rouen Avenue de l Université - 768 Saint-Etienne du Rouvray

More information

On the Local Quadratic Convergence of the Primal-Dual Augmented Lagrangian Method

On the Local Quadratic Convergence of the Primal-Dual Augmented Lagrangian Method Optimization Methods and Software Vol. 00, No. 00, Month 200x, 1 11 On the Local Quadratic Convergence of the Primal-Dual Augmented Lagrangian Method ROMAN A. POLYAK Department of SEOR and Mathematical

More information

Sparse Solutions of an Undetermined Linear System

Sparse Solutions of an Undetermined Linear System 1 Sparse Solutions of an Undetermined Linear System Maddullah Almerdasy New York University Tandon School of Engineering arxiv:1702.07096v1 [math.oc] 23 Feb 2017 Abstract This work proposes a research

More information

COMPARATIVE ANALYSIS OF ORTHOGONAL MATCHING PURSUIT AND LEAST ANGLE REGRESSION

COMPARATIVE ANALYSIS OF ORTHOGONAL MATCHING PURSUIT AND LEAST ANGLE REGRESSION COMPARATIVE ANALYSIS OF ORTHOGONAL MATCHING PURSUIT AND LEAST ANGLE REGRESSION By Mazin Abdulrasool Hameed A THESIS Submitted to Michigan State University in partial fulfillment of the requirements for

More information

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra. DS-GA 1002 Lecture notes 0 Fall 2016 Linear Algebra These notes provide a review of basic concepts in linear algebra. 1 Vector spaces You are no doubt familiar with vectors in R 2 or R 3, i.e. [ ] 1.1

More information

Coordinate Update Algorithm Short Course Proximal Operators and Algorithms

Coordinate Update Algorithm Short Course Proximal Operators and Algorithms Coordinate Update Algorithm Short Course Proximal Operators and Algorithms Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 36 Why proximal? Newton s method: for C 2 -smooth, unconstrained problems allow

More information

A Bregman alternating direction method of multipliers for sparse probabilistic Boolean network problem

A Bregman alternating direction method of multipliers for sparse probabilistic Boolean network problem A Bregman alternating direction method of multipliers for sparse probabilistic Boolean network problem Kangkang Deng, Zheng Peng Abstract: The main task of genetic regulatory networks is to construct a

More information

Structured matrix factorizations. Example: Eigenfaces

Structured matrix factorizations. Example: Eigenfaces Structured matrix factorizations Example: Eigenfaces An extremely large variety of interesting and important problems in machine learning can be formulated as: Given a matrix, find a matrix and a matrix

More information

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization Frank-Wolfe Method Ryan Tibshirani Convex Optimization 10-725 Last time: ADMM For the problem min x,z f(x) + g(z) subject to Ax + Bz = c we form augmented Lagrangian (scaled form): L ρ (x, z, w) = f(x)

More information

Minimization of Transformed L 1 Penalty: Closed Form Representation and Iterative Thresholding Algorithms

Minimization of Transformed L 1 Penalty: Closed Form Representation and Iterative Thresholding Algorithms Minimization of Transformed L Penalty: Closed Form Representation and Iterative Thresholding Algorithms Shuai Zhang, and Jack Xin arxiv:4.54v [cs.it] 8 Oct 6 Abstract The transformed l penalty (TL) functions

More information

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique Master 2 MathBigData S. Gaïffas 1 3 novembre 2014 1 CMAP - Ecole Polytechnique 1 Supervised learning recap Introduction Loss functions, linearity 2 Penalization Introduction Ridge Sparsity Lasso 3 Some

More information

Sparsity in Underdetermined Systems

Sparsity in Underdetermined Systems Sparsity in Underdetermined Systems Department of Statistics Stanford University August 19, 2005 Classical Linear Regression Problem X n y p n 1 > Given predictors and response, y Xβ ε = + ε N( 0, σ 2

More information

Optimization methods

Optimization methods Optimization methods Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda /8/016 Introduction Aim: Overview of optimization methods that Tend to

More information

TRANSFORMED SCHATTEN-1 ITERATIVE THRESHOLDING ALGORITHMS FOR LOW RANK MATRIX COMPLETION

TRANSFORMED SCHATTEN-1 ITERATIVE THRESHOLDING ALGORITHMS FOR LOW RANK MATRIX COMPLETION TRANSFORMED SCHATTEN- ITERATIVE THRESHOLDING ALGORITHMS FOR LOW RANK MATRIX COMPLETION SHUAI ZHANG, PENGHANG YIN, AND JACK XIN Abstract. We study a non-convex low-rank promoting penalty function, the transformed

More information

An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss

An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss arxiv:1811.04545v1 [stat.co] 12 Nov 2018 Cheng Wang School of Mathematical Sciences, Shanghai Jiao

More information

University of Luxembourg. Master in Mathematics. Student project. Compressed sensing. Supervisor: Prof. I. Nourdin. Author: Lucien May

University of Luxembourg. Master in Mathematics. Student project. Compressed sensing. Supervisor: Prof. I. Nourdin. Author: Lucien May University of Luxembourg Master in Mathematics Student project Compressed sensing Author: Lucien May Supervisor: Prof. I. Nourdin Winter semester 2014 1 Introduction Let us consider an s-sparse vector

More information

Sparse Optimization Lecture: Dual Methods, Part I

Sparse Optimization Lecture: Dual Methods, Part I Sparse Optimization Lecture: Dual Methods, Part I Instructor: Wotao Yin July 2013 online discussions on piazza.com Those who complete this lecture will know dual (sub)gradient iteration augmented l 1 iteration

More information

Motivation Sparse Signal Recovery is an interesting area with many potential applications. Methods developed for solving sparse signal recovery proble

Motivation Sparse Signal Recovery is an interesting area with many potential applications. Methods developed for solving sparse signal recovery proble Bayesian Methods for Sparse Signal Recovery Bhaskar D Rao 1 University of California, San Diego 1 Thanks to David Wipf, Zhilin Zhang and Ritwik Giri Motivation Sparse Signal Recovery is an interesting

More information

ABSTRACT. Recovering Data with Group Sparsity by Alternating Direction Methods. Wei Deng

ABSTRACT. Recovering Data with Group Sparsity by Alternating Direction Methods. Wei Deng ABSTRACT Recovering Data with Group Sparsity by Alternating Direction Methods by Wei Deng Group sparsity reveals underlying sparsity patterns and contains rich structural information in data. Hence, exploiting

More information

Convex Optimization and l 1 -minimization

Convex Optimization and l 1 -minimization Convex Optimization and l 1 -minimization Sangwoon Yun Computational Sciences Korea Institute for Advanced Study December 11, 2009 2009 NIMS Thematic Winter School Outline I. Convex Optimization II. l

More information

Orthogonal Matching Pursuit for Sparse Signal Recovery With Noise

Orthogonal Matching Pursuit for Sparse Signal Recovery With Noise Orthogonal Matching Pursuit for Sparse Signal Recovery With Noise The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation As Published

More information

The Alternating Direction Method of Multipliers

The Alternating Direction Method of Multipliers The Alternating Direction Method of Multipliers With Adaptive Step Size Selection Peter Sutor, Jr. Project Advisor: Professor Tom Goldstein December 2, 2015 1 / 25 Background The Dual Problem Consider

More information

10725/36725 Optimization Homework 4

10725/36725 Optimization Homework 4 10725/36725 Optimization Homework 4 Due November 27, 2012 at beginning of class Instructions: There are four questions in this assignment. Please submit your homework as (up to) 4 separate sets of pages

More information

ON THE GLOBAL AND LINEAR CONVERGENCE OF THE GENERALIZED ALTERNATING DIRECTION METHOD OF MULTIPLIERS

ON THE GLOBAL AND LINEAR CONVERGENCE OF THE GENERALIZED ALTERNATING DIRECTION METHOD OF MULTIPLIERS ON THE GLOBAL AND LINEAR CONVERGENCE OF THE GENERALIZED ALTERNATING DIRECTION METHOD OF MULTIPLIERS WEI DENG AND WOTAO YIN Abstract. The formulation min x,y f(x) + g(y) subject to Ax + By = b arises in

More information

Uniform Uncertainty Principle and signal recovery via Regularized Orthogonal Matching Pursuit

Uniform Uncertainty Principle and signal recovery via Regularized Orthogonal Matching Pursuit Uniform Uncertainty Principle and signal recovery via Regularized Orthogonal Matching Pursuit arxiv:0707.4203v2 [math.na] 14 Aug 2007 Deanna Needell Department of Mathematics University of California,

More information

Signal Recovery from Permuted Observations

Signal Recovery from Permuted Observations EE381V Course Project Signal Recovery from Permuted Observations 1 Problem Shanshan Wu (sw33323) May 8th, 2015 We start with the following problem: let s R n be an unknown n-dimensional real-valued signal,

More information

Robust Principal Component Analysis

Robust Principal Component Analysis ELE 538B: Mathematics of High-Dimensional Data Robust Principal Component Analysis Yuxin Chen Princeton University, Fall 2018 Disentangling sparse and low-rank matrices Suppose we are given a matrix M

More information

A New Trust Region Algorithm Using Radial Basis Function Models

A New Trust Region Algorithm Using Radial Basis Function Models A New Trust Region Algorithm Using Radial Basis Function Models Seppo Pulkkinen University of Turku Department of Mathematics July 14, 2010 Outline 1 Introduction 2 Background Taylor series approximations

More information

HYBRID JACOBIAN AND GAUSS SEIDEL PROXIMAL BLOCK COORDINATE UPDATE METHODS FOR LINEARLY CONSTRAINED CONVEX PROGRAMMING

HYBRID JACOBIAN AND GAUSS SEIDEL PROXIMAL BLOCK COORDINATE UPDATE METHODS FOR LINEARLY CONSTRAINED CONVEX PROGRAMMING SIAM J. OPTIM. Vol. 8, No. 1, pp. 646 670 c 018 Society for Industrial and Applied Mathematics HYBRID JACOBIAN AND GAUSS SEIDEL PROXIMAL BLOCK COORDINATE UPDATE METHODS FOR LINEARLY CONSTRAINED CONVEX

More information

Bias-free Sparse Regression with Guaranteed Consistency

Bias-free Sparse Regression with Guaranteed Consistency Bias-free Sparse Regression with Guaranteed Consistency Wotao Yin (UCLA Math) joint with: Stanley Osher, Ming Yan (UCLA) Feng Ruan, Jiechao Xiong, Yuan Yao (Peking U) UC Riverside, STATS Department March

More information

Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012

Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012 Instructions Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012 The exam consists of four problems, each having multiple parts. You should attempt to solve all four problems. 1.

More information

Robust Principal Component Analysis Based on Low-Rank and Block-Sparse Matrix Decomposition

Robust Principal Component Analysis Based on Low-Rank and Block-Sparse Matrix Decomposition Robust Principal Component Analysis Based on Low-Rank and Block-Sparse Matrix Decomposition Gongguo Tang and Arye Nehorai Department of Electrical and Systems Engineering Washington University in St Louis

More information

Sparse Recovery of Streaming Signals Using. M. Salman Asif and Justin Romberg. Abstract

Sparse Recovery of Streaming Signals Using. M. Salman Asif and Justin Romberg. Abstract Sparse Recovery of Streaming Signals Using l 1 -Homotopy 1 M. Salman Asif and Justin Romberg arxiv:136.3331v1 [cs.it] 14 Jun 213 Abstract Most of the existing methods for sparse signal recovery assume

More information

Data Sparse Matrix Computation - Lecture 20

Data Sparse Matrix Computation - Lecture 20 Data Sparse Matrix Computation - Lecture 20 Yao Cheng, Dongping Qi, Tianyi Shi November 9, 207 Contents Introduction 2 Theorems on Sparsity 2. Example: A = [Φ Ψ]......................... 2.2 General Matrix

More information

arxiv: v3 [math.oc] 19 Oct 2017

arxiv: v3 [math.oc] 19 Oct 2017 Gradient descent with nonconvex constraints: local concavity determines convergence Rina Foygel Barber and Wooseok Ha arxiv:703.07755v3 [math.oc] 9 Oct 207 0.7.7 Abstract Many problems in high-dimensional

More information

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings Structural and Multidisciplinary Optimization P. Duysinx and P. Tossings 2018-2019 CONTACTS Pierre Duysinx Institut de Mécanique et du Génie Civil (B52/3) Phone number: 04/366.91.94 Email: P.Duysinx@uliege.be

More information

Point Source Super-resolution Via Non-convex L 1 Based Methods

Point Source Super-resolution Via Non-convex L 1 Based Methods Noname manuscript No. (will be inserted by the editor) Point Source Super-resolution Via Non-convex L 1 Based Methods Yifei Lou Penghang Yin Jack Xin Abstract We study the super-resolution (SR) problem

More information

Sparse Optimization Lecture: Basic Sparse Optimization Models

Sparse Optimization Lecture: Basic Sparse Optimization Models Sparse Optimization Lecture: Basic Sparse Optimization Models Instructor: Wotao Yin July 2013 online discussions on piazza.com Those who complete this lecture will know basic l 1, l 2,1, and nuclear-norm

More information

ACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING

ACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING ACCELERATED FIRST-ORDER PRIMAL-DUAL PROXIMAL METHODS FOR LINEARLY CONSTRAINED COMPOSITE CONVEX PROGRAMMING YANGYANG XU Abstract. Motivated by big data applications, first-order methods have been extremely

More information

SPARSE SIGNAL RESTORATION. 1. Introduction

SPARSE SIGNAL RESTORATION. 1. Introduction SPARSE SIGNAL RESTORATION IVAN W. SELESNICK 1. Introduction These notes describe an approach for the restoration of degraded signals using sparsity. This approach, which has become quite popular, is useful

More information

U.C. Berkeley CS294: Beyond Worst-Case Analysis Handout 12 Luca Trevisan October 3, 2017

U.C. Berkeley CS294: Beyond Worst-Case Analysis Handout 12 Luca Trevisan October 3, 2017 U.C. Berkeley CS94: Beyond Worst-Case Analysis Handout 1 Luca Trevisan October 3, 017 Scribed by Maxim Rabinovich Lecture 1 In which we begin to prove that the SDP relaxation exactly recovers communities

More information

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization /

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization / Coordinate descent Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Adding to the toolbox, with stats and ML in mind We ve seen several general and useful minimization tools First-order methods

More information

AN INTRODUCTION TO COMPRESSIVE SENSING

AN INTRODUCTION TO COMPRESSIVE SENSING AN INTRODUCTION TO COMPRESSIVE SENSING Rodrigo B. Platte School of Mathematical and Statistical Sciences APM/EEE598 Reverse Engineering of Complex Dynamical Networks OUTLINE 1 INTRODUCTION 2 INCOHERENCE

More information

Iterative Methods for Solving A x = b

Iterative Methods for Solving A x = b Iterative Methods for Solving A x = b A good (free) online source for iterative methods for solving A x = b is given in the description of a set of iterative solvers called templates found at netlib: http

More information

Assignment 1: From the Definition of Convexity to Helley Theorem

Assignment 1: From the Definition of Convexity to Helley Theorem Assignment 1: From the Definition of Convexity to Helley Theorem Exercise 1 Mark in the following list the sets which are convex: 1. {x R 2 : x 1 + i 2 x 2 1, i = 1,..., 10} 2. {x R 2 : x 2 1 + 2ix 1x

More information